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past  literature  about  UAV  detection  and  pose  estimation  and  exploring  comparison  of  multiple 
state-of-the-art  algorithms.  The  thesis  presents  implementation  studies  of  detection  approaches 
including  color-based  detection  and  component-based  detection.  We  also  present  studies  of 
pose  estimation  methods  including  the  Posit  algorithm,  homography-based  detection,  and  the 
EPFL  non-iterative  method.  The  thesis  provides  a  preliminary  strategy  for  detecting  small  UAVs 
and  for  estimating  its  six  degree  of  freedom  (6DOF)  pose  from  image  sequences  within  the  pre¬ 
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CHAPTER  1; 
Introduction 


1.1  Motivation 

The  collision  of  the  Cessna  172  in  Visual  Flight  Rules  (VFR)  conditions  with  a  Cirrus  SR22 
in  Aug  10,  2008  put  to  the  test  “big  sky,  little  plane”  theory,  that  states  that  wide-open  spaces 
generally  prevent  two  aircraft  from  colliding.  The  National  Transportation  Safety  Board  deter¬ 
mined  that  the  probable  cause(s)  of  this  accident  was  the  failure  of  both  pilots  to  see  and  avoid 
each  other’s  aircraft  [1].  Table  1.1  gives  statistics  on  the  number  of  Mid-Air  Collision  (MAC) 
in  U.S.  airspace  between  1991  and  2002. 


Year 

MAC  Events 

Operating  Hours  (millions) 

Rate  per  10^  Flight  Hours 

1991 

18 

27.2 

0.66 

1992 

11 

24.8 

0.44 

1993 

13 

22.8 

0.57 

1994 

11 

22.2 

0.50 

1995 

14 

24.9 

0.56 

1996 

18 

24.9 

0.72 

1997 

13 

25.5 

0.51 

1998 

14 

26.8 

0.52 

1999 

15 

29.5 

0.51 

2000 

19 

30.8 

0.62 

2001 

5 

25.9 

0.19 

2002 

7 

25.9 

0.27 

Average 

13.17 

25.93 

0.51 

Table  1.1:  This  table  shows  the  number  of  MAC  for  each  year  starting  from  199 1  to  2002  for  the  number 
of  operation  hours.  We  can  see  that  the  year  1991  has  the  highest  MAC  rate  and  2001  has  the  lowest  rate 
(from:  [2]). 


The  introduction  of  Unmanned  Aerial  Vehicle  (UAV)s  into  the  National  Airspace  (NAS)  is 
challenging  for  the  Federal  Administration  Agency  (FAA)  and  the  aviation  community.  Under 
existing  regulations  in  the  U.S.,  UAV  operations  require  special  approval  if  operating  higher 
than  500  feet  Above  Ground  Level  (AGL)  or  over  populated  areas  [3].  Furthermore,  UAV  oper¬ 
ations  beyond  Line-of-Sight  (LOS)  may  require  an  automated  sense-and-avoid  (S&  A)  system 
due  to  potential  communications  latencies  or  failures.  FAA  policy  to  date  has  been  to  enforce 
highly  restrictive  operating  requirements  that  involve  segregating  UAV  operations  from  those 
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of  piloted  aircraft,  and  confining  the  former  to  specially  designated  airspaces.  Table  1.2  depicts 
the  azimuth  and  elevation  boundaries. 


Source 

Azimuth 

Elevation 

FAA  P-8740-51:  How  to  Avoid 
a  Mid-Air  Collision 

+!-  60  degrees 

+!-  10  degrees 

International  Standards, 

Rules  of  the  Air,  Section  3.2  (ICAO) 

+!-  110  degrees 

No  guidance 

FAA  Advisory  Circular  25.773-1 
(Transport  Aircraft  Design) 

+!-  120  degrees 

Variable:  -i-37  and  -25 

degrees  (varies  with  azimuth) 

Table  1.2:  Existing  Guidance  for  Detection  Search  Areas  (from:  [2]) 


In  addition  to  the  safety  concern,  a  major  threat  is  that  future  warfighters  may  have  to 
face  a  UAS -equipped  opponent  [4]  since  many  players  have  been  active  in  acquiring  their  own 
UAVs,  requiring  allied  nations  to  refine  techniques  to  counter  hostile  drones.  In  this  respect 
Department  of  Defense  (DoD)  has  begun  a  series  of  research  projects  and  exercises  to  test  out 
what  functionality  should  the  UAV  offer  to  detect  and  counter  such  a  threat.  Thus  the  research 
detailed  in  this  thesis  is  part  of  an  ongoing  research  project  on  “Swarm  vs.  Swarm  Unmanned 
Systems.”  In  this  thesis  UAV  to  UAV  detection  is  considered  one  of  the  most  important  issues 
to  be  addressed  for  both  collision  avoidance  and  any  manoeuvre  against  enemy  UAV.  The 
DoD  performs  a  yearly  exercise  known  as  “Black  Dart,”  which  aims  to  check  the  ability  of  a 
variety  of  manned  and  unmanned  aerial  systems  to  detect  and  identify  enemy  Unmanned  Aerial 
System  (UAS)  platforms  in  flight.  Figure  1.1  illustrates  the  overall  components  of  the  project 
and  fits  this  thesis’s  research  into  the  context  of  the  larger  project. 

Whether  the  case  is  to  safely  fly  UAVs,  in  the  NAS  or  to  counter  enemy  UAS,  an  auto¬ 
mated  “sense  and  avoid”  system  is  key  to  enabling  UAVs  to  routinely  fly  within  shared  airspace. 
That  is,  an  S&A  system  should  be  able  to  distinguish  the  target  UAV  from  the  background  with 
a  performance  that  equals  or  exceeds  that  of  a  human  pilot,  as  stated  in  flight  regulations.  Such 
a  detection  system  should  provide: 

•  High  detection  rates 

•  Low  false  alert  rates 

•  Early  detection 

•  Real-time  operation 
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4'  Common  Informational  picture 


Figure  1.1:  Swarm  vs.  Swarm  Unmanned  Systems:  This  figure  shows  how  the  whole  project  is  organized 
and  subdivided  into  smaller  projects.  In  most  cases,  these  smaller  projects  are  theses  being  worked  on 
by  Naval  Postgraduate  School  students..  This  thesis  addresses  local  perception,  which  deals  with  target 
segmentation,  relative  coordinates,  and  pose  estimation  using  computer  vision. 


A  general  deseription  of  a  sense  and  avoid  system  will  involve  the  following  steps: 

•  Sensing:  obtaining  aeeurate  and  reliable  information  about  the  airspaee  where  the  UAV 
operates. 

•  Deteetion:  Identify  and  prioritize  eollision  threats  based  on  the  information  sensed. 

•  Aetion:  Determining  Manoeuvre  neeessary  to  maintain  safe  separation  or  to  engage  in 
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some  kind  of  operations. 


Figure  1 .2  illustrates  the  general  deseription  of  sueh  a  sense  and  avoid  system. 


Figure  1.2:  General  S&A  system  (from:  [5]):  It  is  a  Sensing  Detection  Threat  identification  and  Avoid¬ 
ance  (a.k.a.  SDTA)  loop  where  the  surrounding  environment  is  continuously  sensed,  potential  obstacles 
are  detected  followed  by  a  decision  whether  they  are  considered  as  threat  or  not  and  then  the  avoidance 
manoeuvre  is  performed 


The  detection  step  is  considered  one  of  the  most  important  component  of  the  collision 
avoidance  system  because  of  the  critical  information  that  it  provides  to  systems  responsible  for 
determining  and  executing  appropriate  action  [5]. 


1.2  Literature  Review 

The  global  objective  of  the  Collision  Avoidance  System  (CAS)  is  to  allow  UAVs  to  operate 
safely  within  the  non-segregated  civil  and  military  airspace  on  a  routine  basis.  Generally  speak¬ 
ing,  a  CAS  can  be  achieved  via  cooperative  or  non-cooperative  means,  and  the  physical  sensors 
that  collect  the  information  may  be  categorized  as  either  active  or  passive. 


1.2.1  Sensors 

UAV  target  designation  devices  are  either  passive  or  active  sensors.  Passive  sensors  rely  on 
radiated  energy  from  the  targets  of  interest.  Passive  sensors  include  optical  cameras  (such  as 
CMOS  or  CCD  cameras),  infrared  cameras  such  as  forward  looking  infrared  (FLIR),  and  radio 
receivers  as  well  as  radio  direction  finding  equipment.  Active  sensors,  on  the  other  hand,  re¬ 
quire  interaction  with  targets.  They  transmit  energy  and  detect  the  energy  reflected  directly  or 
indirectly  from  these  targets.  There  are  several  target  acquisition  sensors  which  can  be  carried 
by  each  UAV  as  a  payload.  They  are: 
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Electro-Optics 

Optical  sensors  include  both  imaging  and  non-imaging  sensors;  many  are  based  on  the  use  of 
lasers,  imaging  systems.  Among  these  sensors  we  can  mention  FLIR  or  thermal  imagers,  which 
is  a  device  that  enhances  target  acquisition  in  low  visibility  situation  utilizing  heat  or  thermal 
images  from  the  target.  Many  UAVs  carry  IR  sensors  on  their  payloads  that  detect  heat  as  is  the 
case  of  the  Global  Hawk  UAV  [6]  (along  with  synthetic-aperture  radar  as  well  as  electro-optical 
sensors).  As  a  case  study  of  the  IR  sensor’s  effectiveness,  it  was  used  to  assess  the  damage 
inside  the  Japanese  nuclear  reactors  after  the  2011  earthquake  and  tsunami.  Its  IR  sensors  were 
able  to  acquire  images  of  the  reactors  showing  which  parts  of  them  were  at  what  temperatures. 
IR  requires  heat  from  an  object  to  perform  object  detection,  which  is  most  suitable  for  night¬ 
time  use.  In  addition  to  FLIR,  we  can  mention  high  resolution  charge  coupled  device  (CCD) 
cameras,  which  can  help  target  detection  and  recognition  during  daylight  where  weather  is  clear. 
It  requires  an  advanced  image  processing  and  pattern  recognition  algorithm  to  analyze  picture 
and  video  imagery  for  target  recognition  tasks. 

Radar/Lidar 

Radar  is  the  sensor  of  choice  for  long-range  collision  avoidance  [2];  the  problem  with  radar  is 
that  long-range  sensors  require  a  lot  of  power  and  beam  localization  requires  a  large  antenna 
neither  of  which  is  suitable  for  small  UAVs.  Some  work  has  been  done  to  miniaturize  radar  so 
that  it  will  fit  in  small  UAVs,  but  it  is  still  in  the  experimental  phase.  Synthetic  aperture  radar 
(SAR)  and  Light  Detection  And  Ranging  (Lidar)  are  also  used  as  active  sensors  for  various 
missions,  especially  for  target  recognition  and  detection.  Such  active  sensors  can  be  helped  by 
the  use  of  a  moving  target  indicator  (MTI)  which  enables  the  radar  to  designate  a  moving  target 
and  engage  the  target  continuously  until  a  decision  is  made  concerning  the  target  or  it  is  outside 
the  radar’s  detection  range. 

Acoustic  Sensors 

Acoustic  wave  sensors  are  electronic  devices  that  can  measure  sound  levels  [7].  When  an  acous¬ 
tic  wave  travels  through  a  certain  material  or  along  the  surface  of  a  material,  it  is  influenced  by 
the  different  material  properties  and  obstacles  it  travels  through.  Any  changes  to  the  charac¬ 
teristics  of  this  path  affect  the  velocity  and/or  amplitude  of  the  wave.  These  characteristics  are 
translated  into  a  digital  signal  (output)  using  transducers  and  can  be  monitored  by  measuring 
the  frequency  or  phase  characteristics  of  the  sensor.  These  changes  can  then  be  translated  to 
the  corresponding  physical  differences  being  measured.  For  example.  Scientific  Applications  & 
Research  Associates,  Inc.  has  developed  a  compact  acoustic  sensor  system  for  detecting  aircraft 
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known  as  Passive  Acoustic  Non-Cooperative  Collision- Alert  System  (PANCAS)  [2].  PANCAS 
works  by  detecting  the  noise  generated  by  aircraft.  This  system  shows  good  performance  in 
adverse  weather  and  battlefield  conditions,  but  it  suffers  from  low  bearing  resolution.  The  data 
gathered  from  acoustic  sensor  can  be  used  to  cue  optical  sensors  and  other  narrower  Field  Of 
View  (FOV)  sensors  to  potential  targets. 

1.2.2  Operational  Mode 

Cooperative  Collision  Detection 

A  cooperative  collision  detection  system  includes  all  communications  equipment  that  enables 
exchange  information  (such  as  position,  heading,  speed  and  waypoints)  between  the  cooper¬ 
ative  aircraft.  Among  these  devices  there  is  the  Traffic  Alert  and  Collision  Avoidance  Sys¬ 
tem  (TCAS)  [8],  which  is  based  on  a  transponder  and  provides  a  highly  reliable  detection  sensor 
for  cooperative  threats.  Other  emerging  technologies  include  Airborne  Separation  Assistance 
Systems  (ASAS)  and  Automatic  Dependent  Surveillance  Broadcast  (ADS-B)  [9]  [10].  The 
Military  Airborne  Surveillance  System  (MASS)  is  the  military  counterpart  to  TCAS.  Though 
designed  for  manned  aircraft,  TCAS  will  likely  work  with  larger  UAVs  though  it  may  present 
problems  with  smaller  UAVs,  which  have  limited  payloads  and  low  electrical  generation  ca¬ 
pabilities.  Another  drawback  of  TCAS  is  seen  when  one  of  the  UAVs  fails  to  cooperate  with 
others;  in  this  case,  the  system  becomes  useless. 

Non-Cooperative  Collision  Detection 

In  non-cooperative  collision  detection  systems,  the  solution  requires  new  sensors  from  the  avail¬ 
able  technologies  including  laser  range  finders,  optical  flow  sensors  such  as  Electro-Optical/Infra- 
Red  (EO/IR),  radar  systems  or  stereo  camera  pairs,  or  single  moving  camera.  These  sensors 
provide  situation  awareness  about  the  surrounding  environment  and  all  the  processing  and  de¬ 
cisions  are  made  by  the  concerned  aircraft  without  cooperation  or  feedback  from  others. 

1.2.3  Related  Works 

This  section  introduces  an  extensive  UAV  target  detection  literature  followed  by  works  related 
to  UAV  pose  estimation  while  focusing  on  computer  vision  methods  as  the  selected  approach. 

Target  Detection 

There  has  been  extensive  research  on  the  subject  of  target  detection  and,  more  precisely,  UAV 
collision  avoidance.  Current  research  in  this  field  has  experimented  with  a  variety  of  sensor 
technologies,  such  as  radar,  laser  range  finders,  sonar  sensors,  infrared  sensors,  and  cameras. 
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The  use  of  long-range  high-resolution  laser  seanners  on  larger  UAVs  provided  sueeess- 
ful  navigation  in  eluttered  environments  [11].  Nevertheless,  these  UAVs  are  eonstrained  to  use 
most  of  their  payload  eapability  to  earry  the  laser  seanner.  Although  laser  seanners  have  been 
miniaturized  for  use  on  small  and  Miero  Unmanned  Vehieles,  the  result  is  a  loss  of  both  resolu¬ 
tion  and  sensing  direetions  [12].  Radar  has  been  also  used  as  a  reliable  deteetion  sensor  suitable 
for  large  UAVs,  to  the  author’s  knowledge  there  is  no  miniaturized  implementation  of  radar  to 
fit  in  small  UAVs;  in  addition,  long-range  radar  requires  signifieant  power  and  a  large  antenna 
to  localize  the  beam. 

The  first  feasibility  study  of  hear-and-avoid  was  presented  in  [13]  using  acoustic  vector 
sensors  on  small  UAVs;  the  study’s  findings  suggest  that  sounds  as  produced  by  civil  aircraft 
will  be  clearly  detectable  by  the  sensors,  but  that  will  not  be  effective  for  electrically  powered 
UAVs  that  produce  very  little  sound. 

An  extensive  focus  has  been  given  to  the  use  of  cameras  for  outdoor  UAV  sense-and- 
avoid.  This  is  because  the  camera  is  a  passive  sensor  that  consumes  less  energy  compared  to 
radar,  laser  scanners,  and  active  sensors  in  general  and  also  because  it  has  a  relatively  light 
weight  and  fits  easily  in  small  UAVs. 

S.  Grzonka  et  al.  [12]  used  passive,  stereo  image-based  obstacle  detection.  The  main 
drawback  of  the  use  of  stereo  vision  is  that  it  has  a  limited  detection  range,  which  depends  on 
the  resolution  of  the  images  and  also  the  distance  between  the  two  cameras.  Symington  et  al. 
presented  in  [14]  a  probabilistic  target  detection  by  camera-equipped  UAVs  which  considered 
the  problem  of  detecting  and  tracking  a  static  target  from  a  camera  mounted  below  a  quadrotor 
UAV.  For  purposes  of  image  registration,  Speeded-Up  Robust  Features  (SURF)  were  used  [15]. 

The  results  of  experiments  with  multi-source  remote  sensing  images  show  that  the  ap¬ 
proach  used  can  achieve  acceptable  accuracy  and  satisfy  real-time  demand.  However,  registra¬ 
tion  error  is  difficult  to  analyse  because  it  is  hard  to  distinguish  wrong  features  from  features  of 
the  area  with  local  distortion.  To  solve  this  problem,  a  better  feature-matching  method  should 
be  researched  to  eliminate  incorrectly  matched  pairs  [15]. 

Dae-Yeon  and  Tahk  explored  in  [16]  the  use  of  a  light  source-aided  computer  vision 
system  for  UAV  landing  using  some  colored  LED  (red  and  green)  on  the  edge  of  the  recovery 
net  and  devising  an  algorithm  to  accurately  detect  these  lights  using  image  processing  stage. 

Continued  advances  in  the  fields  of  image-processing  and  computer  vision  have  pro- 
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moted  their  suitability  for  target  detection.  That  is,  extensive  work  has  been  published  such 
as  in  [17]  and  [18].  The  majority  of  the  research  that  has  been  published  on  the  topic  of  au¬ 
tomatic  target  detection  using  computer  vision  employs  different  approaches  including  spatial 
techniques,  such  as  morphological  filtering  [19]  which  used  a  close-minus-open  filter  to  extract 
point-like  features  from  large-scale  clutter,  such  as  clouds.  The  preliminary  analysis  of  the  data 
set  has  yielded  encouraging  results,  achieving  first  detection  times  at  distances  of  approximately 
6.5km  (3.5nmi),  and  median  filters  [20]  which  deal  with  the  problem  of  detection  and  tracking 
of  low  observable  small-targets  from  a  sequence  of  IR  images  against  a  structural  background 
and  non-stationary  clutter,  and  temporal-based  techniques  such  as  3D  matched  filtering  [21]  and 
dynamic  programming  [22]. 

Pose  Estimation 

Pose  estimation  is  considered  to  be  a  main  function  for  UAV  navigation  which  aims  to  determine 
the  position  in  term  of  x,  y,  and  z  coordinates  and  orientation  in  term  of  roll,  pitch,  and  roll  angles 
(Chapter  2  provides  more  details  about  the  subject)  of  an  object,  i.e.,  its  six  Degrees  of  Freedom 
(6DOF).  There  are  many  approaches  and  technologies  to  detect  and  track  the  pose  of  an  object. 
For  example,  mechanical,  magnetic,  inertial,  vision,  and  hybrid  solutions  exist,  each  having  its 
pros  and  cons  [23].  Current  systems  use  Inertial  Measurement  Unit  (IMU)s  with  Differential 
Global  Positioning  System  (DGPS);  however,  DGPS  requires  an  active  communication  channel 
between  the  two  aircraft. 

There  has  been  a  variety  of  indoor  pose  estimation  systems  developed  for  aerial  vehicles. 
A  well-known  algorithm  for  pose  estimation  is  the  Posit  (Pose  with  Iterations)  algorithm,  which 
estimates  the  pose  of  the  camera  with  respect  to  an  object  and  optimizes  the  error  by  using  an 
iterative  process.  This  algorithm  will  be  one  of  the  approaches  studied  and  implemented  in  the 
current  thesis. 

Much  of  the  prior  research  in  using  visual  information  for  pose  estimation  has  been  in 
Vision  Odometry  (VO)  where  a  pose  is  computed  directly  from  the  motion  of  points  in  the 
camera  view.  Another  approach  focuses  on  fusing  information  from  a  camera  with  a  GPS/IMU 
system  to  achieve  more  accurate  pose  estimation.  In  this  respect,  two  fusion  approaches  were 
used: 


•  The  first  uses  visual  information  to  gauge  the  accuracy  of  the  GPS/IMU  estimates  [24], 
which  requires  that  the  3D  location  of  points  in  the  camera  view  be  accurately  estimated. 

•  The  second  approach  directly  estimates  pose  by  aligning  successive  images;  this  method 
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can  reach  accurate  results  but  requires  extensive  computation  time. 

Computer  vision  (a.k.a  CV)  has  been  largely  used  in  pose  estimation  for  both  co-planar 
and  non-coplanar  features  points.  In  1995,  Daniel  DeMenthon  published  a  method  called  Pose 
from  Orthographic  Projection  and  Scaling  with  Iterations  (Posit)  [25]  for  non-coplanar  points. 
Its  is  a  fast  method  that  works  with  orthographic  projection  instead  of  perspective,  which  is 
simpler  to  work  with  and  can  approximate  the  correct  pose.  DeMenthon  also  developed  another 
method  [26]  because  the  first  version  of  Posit  does  not  work  well  for  planar  or  close  to  planar 
point  clouds.  The  coplanar  case  will  be  the  focus  of  this  thesis,  since  the  shape  of  the  UAV  of 
interest  is  nearly  planar. 

Yang  et  al.  [27]  used  computation  of  homography  between  planes  for  pose  estimation 
based  on  four  coplanar  points.  This  study  achieved  promising  results  in  determining  the  pose  of 
four  markers  on  a  compact  disk.  Vincent  Lepetit  et  al.  developed  a  non-iterative  pose  estima¬ 
tion  [28]  method  that  works  well  for  many  points,  since  the  system  of  equations  has  the  same 
computation  time  for  higher  amount  of  points.  This  method  uses  four  control  points  that  are  not 
coplanar,  and  it  defines  all  points  according  to  these  control  points  so  only  the  control  points 
need  to  be  found. 

1.3  Main  Contributions 

This  thesis  shows  the  feasibility  of  UAV  detection  and  pose  estimation  using  commercially 
available  off-the-shelf  (COTS)  camera  and  computer  vision  approaches  and  libraries.  The  re¬ 
sults  of  this  thesis  can  be  used  as  a  starting  point  in  a  future  design  and  implementation  of 
real-time  vision  based  UAV  detection  and  pose  estimation.  This  thesis  serves  also  as  proof 
of  concept  and  foundation  for  computer  vision  techniques  for  both  UAV  detection  and  pose 
estimation. 

1.4  Organization  of  Thesis 

Chapter  1  provides  background  and  condensed  literature  on  the  various  components  of  collision 
avoidance  system.  Chapter  2  presents  conventions,  coordinates  frames  and  camera  model  and 
projections  that  will  be  used  by  computer  vision  algorithms.  Chapter  3  explores  two  computer 
vision  algorithms  for  UAV  detection,  and  presents  their  implementation  and  experimental  re¬ 
sults.  Chapter  4  presents  three  pose  estimation  methods,  discusses  the  simulation  environment, 
and  evaluates  the  performance  of  each  algorithm.  Chapter  5  summarizes  findings  and  explores 
follow-on  research  questions.  The  thesis  organization  is  depicted  graphically  in  Figure  1.3. 
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Figure  1.3:  Thesis  Organization:  After  presenting  a  literature  review,  a  description  of  different  sce¬ 
narios  and  conventions  used  is  provided.  UAV  to  UAV  detection  methods  are  discussed,  algorithms 
implementation  are  provided,  and  results  are  discussed.  Three  pose  estimation  approach  are  analyzed, 
implemented  and  tested.  The  last  chapter  discusses  conclusions  and  recommendations. 
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CHAPTER  2: 
Model  Formulation 


This  chapter  details  the  modeling  process  for  target  detection  and  pose  estimation.  Included  in 
the  chapter  are  descriptions  of  scenarios  and  evaluation  measures,  different  coordinate  frames, 
camera  models,  and  projections  that  will  be  used  in  computer  vision  algorithms  to  perform  both 
detection  and  pose  estimation. 

2.1  Scenario  Development 

While  flying  in  a  shared  airspace  the  UAV  should  be  able  to  detect  and  localize  the  target  UAV 
in  both  collision  course  where  target  flies  head  on  directly  toward  the  host  UAV  (Figure  2.1)  and 
crossing  object  manoeuvre  where  target  UAV  flies  perpendicular  to  the  host  UAV  (Figure  2.2). 
The  work  will  be  based  on  the  following  assumptions: 

•  Flight  speed  is  fixed  at  around  50  knots 

If  the  velocity  of  the  host  UAV  is  denote  by  14,  and  the  velocity  of  the  target  UAV)  as  Vj 
then  Vfi  =  Vt  =  50  knots. 

•  Airspace  dimensions  are  4km  by  500m  by  2km 

•  No  counter  manoeuvre  or  chasing  is  envisaged 

Minimum  Time  for  Detection 

In  the  case  of  collision  course  manoeuvre,  the  target  UAV  will  not  appear  to  be  moving  in  the 
image  plane  because  of  its  low  relative  velocity.  For  this  reason,  the  detection  will  be  difficult 
and  the  time  left  to  both  detect  and  perform  any  action  will  be  short.  This  time  is  function  of  the 
two  vehicles’  range  of  speeds  and  banking  angles  (banking  angle,  or  roll,  is  the  angle  at  which 
the  vehicle  is  inclined  about  its  longitudinal  axis  with  respect  to  its  path). 

In  order  to  avert  collision,  the  host  UAV  needs  to  begin  the  avoidance  manoeuvre  no 
less  than  tmin_avoid  seconds,  which  is  translated  as  starting  the  manoeuvre  when  the  target  UAV 
is  outside  the  radius  =  Vcios  tmin_avoid>  where  Vcios  is  the  closing  velocity  (assuming  exact 
head-on  collision  course,  as  in  Figure  2.1),  given  by  the  following  equation  [2]: 


14loS  —  14  T 


(2.1) 
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It  has  been  shown  in  [2]  that,  for  for  the  instantaneous  banking  angle  ^max,  the  minimum  re¬ 
quired  time  to  make  deeision  to  avoid  the  target  UAV.  (It  is  the  last  ehanee  to  avoid  a  eollision). 

tmin_avoid  ~  ^min  COt(0niax)  —  5.6  0max  (2.2) 

The  approximate  distanee  between  the  two  aireraft  at  the  start  of  the  eollision  avoidance  ma¬ 
noeuvre  must  be: 


^avoid  ^clos  tmin_avoid  ^min  COt(0niax)  —  5.6Vclos  \/  ^/2  *^^max  (2.3) 

These  computed  parameters  provide  constraints  for  the  real-time  computation  required 
to  detect  and  identify  potential  targets  in  real-time  which  will  be  measure  of  effectiveness  of 
any  proposed  approach. 


Vt 


Target  UAV 


Vh 


/K 


i 

^  avoid 


Host  UAV 


Figure  2.1:  Collision  Course  Manoeuvre:  Both  host  and  target  UAV  are  flying  at  constant  speed  v,  =  v/, 
=  50  knots.  They  are  flying  head  on  a  collision  course.  The  distance  davoid  is  the  threshold  for  collision 
avoidance 


Minimum  Detection  Arc  Width 

This  section  attempts  to  determine  what  the  size  of  the  target  UAV  must  be  in  order  to  allow 
detection  to  avoid  a  probable  collision.  The  angle  a  subtended  by  the  target  UAV  of  width  w  as 
viewed  at  a  distance  d  [2]  is  given  by 

a  =  w/d  (2.4) 
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Target  UAV 


HostUAV 


Figure  2.2:  Crossing  Manoeuvre:Target  UAV  is  flying  perpendicular  to  the  flight  direction  of  the  host 
UAV.  Both  UAVs  are  flying  at  the  same  speed  Vt  -  v/,  =50  knots. 

This  case  uses  a  fixed  CCD  eamera  so  no  revisit  time:  sinee  it  is  not  a  seanning  sensor.  The 
total  time  eould  take  up  to 

hotal  ~  ^detect  T  tmin_avoid 

And  the  angle  subtended  in  this  situation  is: 

U  =  W  j  (hotal  *  ^clos) 

Therefore  for  a  given  target  UAV  with  a  veloeity  Vt,  the  sensing  system  needs  to  be  able  to 
deteet  it  whose  are  width  is  at  least  as  small  as  a  ealeulated  above. 

2.2  Conventions 

2.2.1  Coordinate  Systems 

Multiple  eoordinate  systems  are  often  used  to  deseribe  the  positions  of  an  objeet  in  a  eomplex 
environment. 

UAV  Body  Coordinate  Frame 

The  UAV  body  frame  is  noted  by  {B}  and  has  its  eoordinates  axis  ,  Z®  as  shown  in 

Figure  2.3.  The  rotation  angles  are  defined  as: 

•  roll:  (j)^  refers  to  whether  the  body  is  upside-down  or  not,  i.e.,  its  orientation  within  the 


(2.5) 


(2.6) 
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z® 

Yaw 

Y® 

Roll 


Pitch 

yB 

Figure  2.3:  UAV  Body  Coordinate  Frame:The  frame  origin  is  at  the  UAV’s  center  of  gravity,  the  v-axis 
points  forward  along  the  longitudinal  axis  of  the  aircraft,  the  y-axis  points  outwards  towards  the  right 
wing,  and  the  z-axis  is  in  the  upward  direction. 

plane,  or  rotating  around  the  axis 

•  pitch:  6^  refers  to  whether  the  body  is  tilted,  i.e.,  its  orientation  within  the  plane, 
or  rotating  around  the  7®  axis 

•  yaw:  cp^  refers  to  the  direetion  in  whieh  the  body  is  faeing  i.e.,  its  orientation  within  the 

plane,  or  rotating  around  the  Z®  axis. 

Camera  Coordinate  Frame 

This  is  a  right-hand  orthogonal  eoordinate  system  whose  origin  is  loeated  at  the  foeal  point  of 
the  eamera,  where  the  X'^-axis  points  forward  along  the  longitudinal  axis  of  the  camera,  the 
y^-axis  points  outwards  toward  the  right  hand  side,  and  the  Z^-axis  points  downward  from  the 
origin  (Figure  2.4). 

Image  Plane  Coordinate  Frame 

The  camera  model  used  in  this  thesis  is  the  pinhole  camera.  This  model  can  be  used  to  map 
a  three  dimensional  coordinates  of  a  series  of  three  dimensional  coordinates  of  a  point  M,, 
represented  in  the  camera  coordinate  frame  to  two  dimensional  coordinates  in  Image  Plane 
Coordinate  Frame. 

The  coordinates  of  an  object  point  M,  in  the  camera  coordinate  frame  is  Mf  =  [Xf,  Yf,  Zf]  ^ 
and  its  projection  onto  the  image  plane  is  the  point  m,  whose  coordinate  in  the  image  plane  co¬ 
ordinate  frame  is  as  shown  on  Figure  2.4.  The  equation  that  expresses  the  mapping 


14 


Figure  2.4:  Camera  and  Image  Plane  Coordinate  Frame  (after:  [29]):  The  point  c  is  the  origin  of  camera 
coordinate  frame,  or  the  camera  center.  The  axis  is  called  the  principal  axis  or  optical  axis  for  the 
camera  system.The  principal  point(p)  is  where  the  optical  axis  intersects  the  image  plane.  It  establishes 
the  origin  of  the  image  plane  coordinate  system.The  image  plane  coordinate  frame  is  defined  hy  the 
origin  p  ,  the  x-axis  ,  and  the  y-axis  .  The  distance  between  the  point  c  and  p  is  called  focal 
length(/). 


from  camera  coordinate  frame  to  image  plane  coordinate  frame  is: 


fi 


=  //Zpx 


Y 

Yp 


(2.7) 


2.2.2  Camera  Models 

From  a  computer  vision  stand  point  and  in  order  to  work  with  a  given  camera,  a  camera  model 
and  several  camera  related  parameters  must  be  fixed  and  computed  in  advance. 


Pinhole  Camera  Model 

Also  known  as  Camera  Obscura  model,  the  pinhole  camera  model  represents  the  mathematical 
relationship  between  the  coordinates  of  a  3D  point  and  its  projection  onto  the  image  plane. 
The  basis  of  projection  comes  from  a  model  where  all  light  from  a  scene  passes  through  a 
single  point  and  projects  an  inverted  image  on  the  opposite  side.  Figure  2.4  illustrates  a  pinhole 
camera  model  where  the  image  points  are  in  front  of  the  camera  center.  A  3D  point  is  projected 
through  origin  C  to  the  image  plane  (G)  along  the  relationship  between  the  3D  and  2D 
coordinates  is  given  by  Equation  2.1 . 
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Camera  Calibration 

The  purpose  of  camera  calibration  is  to  determine  the  intrinsic  camera  parameters  that  affect 
the  imaging  process.  These  parameters  are: 

•  Position  of  image  center  on  the  image:  commonly  noted  by  {cx,Cy)  which  are  the  coordi¬ 
nates  of  points  p  in  Figure  2.4 

•  Different  scaling  factors  for  row  pixels  {sx)  and  column  pixels  {sy),  i.e.,  pixels  are  not 
typically  square 

•  Focal  length:  denoted  by  /  having  two  scaled  components  fx=f/sx  and  fy=f/sy 

•  Lens  distortion:  represents  the  radial  distortion  coefficient 


For  a  CCD  camera  the  lens  distortion  is  assumed  to  be  zero.  The  corresponding  camera  calibra¬ 
tion  matrix  is  given  by: 


K  = 


/  fx 

0 

V  0 


0 

fy 

0 


Cy 

1  / 


(2.8) 


In  the  current  thesis  the  scaling  factors  are  assumed  to  be  equal  to  one  and  therefore  f  =  fx  —  fy 
There  is  a  very  large  body  of  literature  on  the  subject  of  camera  calibration,  mainly  influenced 
by  Roger  Tsai’s  work  [30].  These  can  be  roughly  classified  into  two  main  methods: 


•  conventional  method:  uses  ID  and  3D  data  to  compute  camera  parameters,  may  work 
with  single  view  of  image. 

•  camera  self-calibration:  only  need  image  data  for  camera  calibration  but  need  image  data 
from  multiple  views. 


Several  calibration  toolboxes  has  been  implemented  in  both  Matlab  and  C  as  in  [31].  This 
work  has  used  a  publicly-available  calibration  toolbox  created  by  George  Vogiatzis  and  Carlos 
Hernandez  and  can  be  found  on  [32].  A  detailed  description  of  the  implementation  will  further 
be  discussed  in  Chapter  4. 


2.2.3  Projection  Models 

Orthographic  Projection 

In  an  orthographic  projection,  a  camera  is  at  infinite  distance  from  a  3D  scene.  All  projection 
lines  are  parallel  to  optical  axis  Xj  =  Xf  and  yJ  =  Yf  as  seen  in  Figure  2.5  (because  the  focal 
length  is  neglected  in  front  of  the  infinite  distance).  Orthographic  projections  are  a  small  set  of 
transformations  often  used  to  show  profile,  detail,  or  precise  measurements  of  a  3D  object. 
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yC 


Figure  2.5:  Orthographic  Projection  (after:  [29]):  In  this  projection  camera  is  at  infinite  distance  from 
the  3D  world  and  all  projections  lines  are  parallel  to  the  optical  axis  (Z^) 


Perspective  Projection 

A  distant  object  in  perspective  projection  appears  smaller  than  objects  nearby.  In  general,  a 
camera  frame  is  not  aligned  with  the  world  frame.  Image  frame  is  not  at  image  center,  but  it  is 
still  parallel  to  the  camera  frame.  Perspective  projection  equations  for  point  Mi  (Figure  2.6)  are 


Xff/Z 

(2.9) 

yff/zf 

(2.10) 

yC 


Figure  2.6:  Perspective  Projection  (after:  [29]):  A  3D  feature  point  M,  is  projected  onto  an  image  plane 
(G)  on  point  m,  with  perspective  rays  originating  at  the  center  of  projection  (c),  which  would  lie  within 
the  physical  camera 
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Scaled  Orthographic  Projection 

A  Scaled  Orthographic  Projection  (SOP)  is  a  first  order  approximation  of  the  perspective  pro¬ 
jection.  It  assumes  that  the  variation  in  the  direction  of  the  optical  axis  is  small  compared  to  the 
viewing  distance.  In  SOP,  the  image  of  a  point  Mi  of  an  object  is  a  point  of  the  image  plane 
G  which  has  coordinates: 

Xl=XfflZ^  (2.11) 

yl  =  YfflZ^  (2.12) 

where  Zq  is  the  z-coordinate  of  reference  point  Mq  in  the  camera  coordinate  frame,  the  ratio 
s  =  fjZQ  is  called  the  scaling  factor  of  the  SOP. 


yC 


Figure  2.7:  Scaled  Orthographic  Projection  (after:  [29]):  An  object  is  given  by  three  points  Mq  the 
reference  point  and  Mi,Mj-  The  coordinates  of  all  objects  points  are  expressed  in  function  of  the  Z 
coordinate  of  the  reference  point. 


Camera  Rotation  and  Translation 

Rotations:  Three  basic  rotations  are  performed  separately  [33].  These  rotations  are: 
•  Rotation  along  x-axis  with  angle  (j)  also  known  as  Roll 


/  1  0  0  \ 

0  cos  0  —  sin  0 
\  0  sin^  cos<^  / 


•  Rotation  along  y-axis  with  angle  6  also  known  as  Pitch 


(2.13) 
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World  Coordinate 

Figure  2.8:  Camera  Rotation  and  Translation 


/  cosO  0  sin0  ^ 

Ry(e)  =  0  10 

^  —  sin0  0  cosO  / 

Rotation  along  z-axis  with  angle  y/  also  known  as  Yaw 


Rz{¥) 


/  cost//  —  sint//  0  \ 
sin  y/  cos  t//  0 

Vo  0  1/ 


The  resulting  overall  rotation  matrix  is 


R  =  Rz{yr)  X  Ry{yr)  X  R^{^) 


^  Rn  R\2  ^13  ^ 
R2\  R22  R23 
\  ^31  R32  R33  / 


(2.14) 


(2.15) 


(2.16) 
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where 


Rn  =  cos  V/ cos  0 

Ri2  =  —  sini/zcos^  +  cos  i// sin  0  sin  ^ 

Ri3  =  sin  i// sin  0  +  cos  i// sin  0  cos  ^ 

R21  =  sin  i// cos  0 
R22  =  cos  V/cos^ 

R23  =  —  cos  i//  sin  ^  +  sin  i/a  sin  0  cos  (j) 

R31  =  —  sin0 
R32  =  cos  0  sin  (j) 

R33  =  cos  0  cos  (j) 

Translation  Vector:  The  translation  vector  is  a  3D  vector  composed  of  displacement  along 
^-axis,  y-axis,  and  z-axis 

■  T,  ■ 

T=  Ty  (2.17) 

T 

L  J 
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CHAPTER  3: 

Computer  Vision  Methods  for  UAV  to  UAV  Detection 


This  chapter  provides  the  theoretieal  foundation  for  the  eomputer  vision  algorithms  that  will  be 
used.  The  first  seetion  deals  with  the  deteetion  problem  by  presenting  two  approaehes,  where  the 
former  uses  edge  deteetion  and  image  smoothing  while  the  latter  uses  a  morphologieal  filtering 
and  eolor-based  deteetion  approaeh.  The  seeond  seetion  of  this  ehapter  presents  diseussion  on 
the  implementation  of  these  approaehes  and  the  experimental  results. 

3.1  UAV  Detection  Algorithms 

3.1.1  UAV  Detection  using  Edge  Detection  and  Image  Smoothing 

Several  steps  are  involved  in  this  algorithm  (Figure  3.1). 


Stream  Video 

Frames  grabber  (30  fps)  U - [I  ’o 


D 


RGB  to  gray  scale 
conversion 


^  Median  Filter 


^  Sobel  Edge  detection 


Input  image 


Obiect  Detection 
(Object  X,  y,  z) 


Components  filtering  + 
outlier  suppression 


Connected  components 
extraction 


Figure  3.1:  UAV  Detection  using  Edge  Detection  and  Image  Smoothing 

The  first  step  in  this  approaeh  is  the  “frame  grabber,”  in  whieh  the  input  video  is  eon- 
verted  into  frame  sequenees  at  a  regular  rate  ealled  Frames  Per  Seeond  (FPS).  The  seeond  step 
is  “Red,  Green,  Blue  (RGB)  to  Gray  seale  eonversion.”  In  an  RGB  image,  a  eolor  pixel  is  de- 
seribed  by  a  triple  (red  (R),  green  (G),  and  blue  (B))  of  eolor  values.  The  purpose  of  this  step  is 
to  map  the  eolor  values  to  a  single  number,  sueh  as  a  gray  seale  value  for  intensity,  to  represent 
eaeh  pixel.  For  example,  the  intensity  information  is  eomposed  exelusively  of  shades  of  gray, 
varying  from  black  at  the  weakest  intensity  to  white  at  the  strongest.  Several  methods  have  been 
used  to  perform  this  eonversion  to  generate  a  single  representative  value  for  eaeh  pixel: 
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The  lightness  method  averages  the  most  prominent  and  least  prominent  eolors: 


max(R,  G,  B)  +  min(R,  G,  B) 

2 

•  The  average  method  simply  averages  the  three  eolor  values: 

R  +  G  +  B 
3 

•  The  luminosity  method  is  a  more  sophisticated  version  of  the  average  method  [34].  It  also 
averages  the  values,  but  it  forms  a  weighted  average  to  account  for  human  perception. 
The  formula  for  luminosity  is: 


0.21R  +  0.71G  +  0.07B 

The  next  step  is  “median  filtering,”  which  is  a  non-linear  spatial  filter  widely  used  in 
image  processing  because  it  preserves  useful  detail  in  the  image  while  reducing  noise.  This  is 
a  preliminary  step  to  improve  the  later  effort  of  edge  detection.  Roughly  speaking,  the  median 
filter  takes  the  median  value  of  pixels  in  a  predefined  window  [35]. 
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Original  image  Image  after  applyinga  median  filter 

3x3  window 


Figure  3.2:  Median  Filtering  (from:  [35]):  The  window  size  is  3  x  3  pixels.  If  we  consider  the  pixel  at 
coordinates  (2,2)  in  the  image  matrix  which  has  a  value  equal  to  200,  its  neighborhood  values  (inside 
the  defined  window)  are:  100, 100, 100, 100,205, 100, 195,200,  for  which  the  median  value  is  100  and 
replaces  the  first  value  (200). 


After  median  filtering  is  performed,  the  frame  is  passed  through  Sobel  edge  detection. 
Edge  detection  is  a  fundamental  operation  in  computer  vision.  Edges  are  significant  local 
changes  of  intensity  in  an  image  and  they  typically  occur  on  the  boundary  between  two  dif- 
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ferent  regions  in  an  image.  Edge  deteetion  extraets  salient  features  of  the  image  and  produees 
a  line  drawing  of  the  image.  The  Sobel  edge  deteetor  is  a  ID  spatial  gradient  measurement 
on  an  image.  It  uses  a  eonvolution  kernel  to  ereate  a  series  of  gradient  magnitudes  so  that  it 
emphasizes  regions  of  high  spatial  frequeney  that  eorrespond  to  edges  [36] .  The  Sobel  edge 
deteetor  uses  a  pair  of  3  x  3  eonvolution  kernels.  An  example  of  two  assoeiated  kernels  is: 


-1  0  1 

-2  0  2 

-1  0  1 


The  next  step  is  to  perform  “eonneeted  eomponents  extraetion  and  image  labeling.”  In 
image  proeessing,  a  “eonneeted  eomponents”  algorithm  groups  pixels  into  eomponents  based 
on  pixel  eonneetivity.  That  is,  it  finds  regions  of  eonneeted  pixels  that  have  the  same  value.  Onee 
all  eomponents  have  been  determined,  eaeh  pixel  is  labeled  aeeording  to  the  unique  identifier 
of  its  blob,  identifying  blob  membership.  Finally,  a  filtering  operation  is  performed  to  eliminate 
outliers  and  non-persistent  features  and  to  isolate  and  deteet  the  objeet.  The  resulting  output  is 
the  eenter  loeation  of  the  deteeted  objeet,  usually  represented  in  the  image  plane. 


3.1.2  Morphological  Filtering  and  Color-Based  Detection 

This  algorithm  is  based  on  the  use  of  a  finite  number  of  eolor  markers  attaehed  to  the  UAV, 
as  in  Figure  2.1  with  six  markers.  They  are  spatially  plaeed  in  an  asymmetrie  manner  that 
helps  faeilitate  robust  deteetion  and,  as  will  be  seen  in  later  in  this  thesis,  assist  in  aeeurate  pose 
estimation  for  planar  points.  The  goal  here  is  to  deteet  these  six  eolored  patehes  and  determine 
their  positions  in  the  image  plane. 

Color  Recognition 

In  order  for  a  eomputer  to  proeess  images  aequired  by  external  eleetro-optieal  sensors,  the 
image  is  eneoded  using  different  eolors  spaees.  A  eolor  spaee  deseribes  the  range  of  eolors 
that  the  eamera  ean  see.  The  eolor  spaee  is  used  to  represent  eaeh  eolor  pixel  in  an  image. 
For  this  thesis,  we  introduee  the  two  main  eolor  spaees  used  in  eomputer  vision,  whieh  are  the 
RGB-spaee  and  the  HSV-spaee. 

•  RGB  Color  Space:  The  RGB  eolor  spaee  [37]  uses  additive  eolor  mixing;  it  is  “all 
possible  eolors”  that  ean  be  made  from  three  eolorants  for  red,  green  and  blue.  It  deseribes 
what  kind  of  light  needs  to  be  emitted  to  produee  a  given  eolor.  In  the  RGB  eolor  spaee. 
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Figure  3.3:  Light  Beacon  Design:  This  figure  depicts  six  red  markers  placed  in  an  asymmetric  manner 
on  the  UAV’s  surface.  These  markers  are  coplanar  as  we  are  dealing  with  coplanar  case 

individual  values  for  red,  green  and  blue  are  stored  as  deseribed  in  thevus  seetion.  A 
eommon  notation  for  RGB  pixels  is  I{x^y^band)  where  /  is  the  image;  x^y  are  the  x  and  y 
pixel  coordinates,  and  band  is  the  color  channel,  either  one  for  red,  two  for  green  or  three 
for  blue. 

•  HSV  Color  Space:  The  HSV  (hue,  saturation,  value)  color  space  [37],  also  known  as 
HSB  (hue,  saturation,  brightness),  is  often  used  by  artists  because  it  is  generally  more 
natural  to  think  about  a  color  in  terms  of  hue  and  saturation  rather  than  in  terms  of  ad¬ 
ditive  or  subtractive  color  components.  The  HSV-space  is  a  transformation  of  the  RGB 
colorspace,  and  its  components  and  colorimetry  are  relative  to  the  RGB  colorspace  from 
which  it  was  derived. 

Detection  Process 

Here,  a  colored  image  of  three  channels,  that  is,  R,  G,  and  B,  is  taken.  Thus,  each  pixel  has  a  red, 
green  and  blue  value.  The  aim  is  to  retrieve  these  values  from  an  image,  then  extract  and  isolate 
specifically  the  red  patches.  (Red  is  selected  arbitrarily  and  the  described  approach  extends 
naturally  to  other  colors.)  In  an  ideal  case,  a  red  pixel  means  R=255,  G=0,  and  B=0.  However,  in 
reality,  there  will  also  be  green  and  blue  components,  too.  In  order  to  do  segmentation  according 
to  color,  the  following  three-step  algorithm  are  applied,  as  developed  in  open  source  [38]: 

Thresholding:  Image  thresholding  is  a  basic  technique  used  for  image  segmentation.  The  main 
purpose  is  to  extract  the  object  of  interest  from  the  background.  That  is,  it  is  necessary  to  select 
a  threshold  5  that  separates  these  two  image  components.  If  we  consider  that  the  binary  image 
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is  represented  by  I{x^y)  (negleeting  band),  then  those  pixels  {x,y)  for  whieh  I{x,y)  >  5  are 
eonsidered  points  representing  the  objeet. 


Morphological  Filtering:  Morphologieal  filtering  [39]  is  based  on  two  primary  morphologieal 
operations  known  as  dilation  and  erosion.  The  dilation  of  a  gray  seale  image,  I{x,y),  by  a 
morphologieal  strueturing  elements,  SE{x,y),  is  defined  by  Equation  3.1  (from:  [40]),  where  © 
denotes  the  dilation  operator  and  p  is  a  pixel  in  the  image  7: 

/©5E=lJSEp  (3.1) 

pa 

Dilation  is  used  for  filling  gaps  in  images;  it  gradually  adds  more  individual  pixels  to  the  output 
image  when  sweeping  1  with  SE  (Figure  3.4).  That  is,  it  grows  image  regions.  For  eaeh  pixel  in 
the  image  I  we  superimpose  the  strueturing  element  SE  on  top  of  the  image  matrix  so  that  the 
origin  of  the  strueturing  element  eoineides  with  the  eurrent  pixel  position.  If  the  origin  of  SE 
matehes  the  underneath  pixel  in  7,  the  eontents  of  SE  are  eopied  to  the  resultant  image. 

In  eontrast,  the  erosion  of  a  gray  seale  image,  I{x,y),  by  morphologieal  strueturing 
elements,  SE{x,y),  is  denoted  by  the  0  operation  and  its  definition  is  given  by  Equation  3.2 
(from:  [40]),  where  p  is  a  pixel  in  the  image  7  and  5  is  a  pixel  in  the  strueturing  element  SE\ 

lQSE  =  {p\p  +  seTiseSE}  (3.2) 

Erosion  is  used  to  eliminate  unwanted  noise  in  a  proeessed  image  by  shrinking  image 
regions  (Figure  3.4).  For  each  pixel  in  the  image  7,  the  structuring  element  SE  is  superimposed 
on  the  image  matrix,  so  that  the  origin  of  the  structuring  element  coincides  with  the  current  pixel 
position.  If  every  pixel  underneath  the  structuring  element  coincides  with  the  pixel  of  SE,  the 
current  pixel  is  considered  a  foreground  pixel  and  set  to  one;  otherwise,  the  pixel  is  considered 
a  background  pixel  and  set  to  zero. 

The  combination  of  these  two  basic  operations  form  morphological  opening  and  mor¬ 
phological  closing.  The  morphological  opening  is  an  erosion  followed  by  a  dilation  (Fig¬ 
ure  3.5),  using  the  same  structuring  element  SE  for  both  operations.  The  opening  operation 
has  basically  the  same  effect  as  erosion,  but  it  is  less  destructive  in  the  sense  that  it  tends  to 
preserve  foreground  pixels  that  have  a  similar  shape  as  SE. 
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Figure  3.4:  Basic  morphological  operations  (after:  [39]):  The  figure  illustrate  a  binary  image  1  to  which 
we  apply  a  morphological  erosion  and  dilation  using  the  structuring  element  SE. 


On  the  other  hand,  the  morphologieal  elosing  is  a  dilation  followed  by  an  erosion  (Fig¬ 
ure  3.5),  using  the  same  strueturing  element  SE  for  both  operations.  It  is  similar  to  dilation,  but 
it  is  less  destruetive  in  a  sense  that  fills  in  holes  and  narrows  valleys  along  edges  in  the  image, 
I. 


Structuring  element  (SE) 
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Figure  3.5:  Advanced  morphological  operations  (after:  [39]):  A  dilation  followed  hy  an  erosion  gives  a 
morphological  closing,  and  an  erosion  followed  hy  dilation  gives  morphological  opening  using  the  same 
structuring  element  SE. 


Blob  Masking  and  Filtering  Process:  After  morphologieal  filtering,  the  next  step  is  “blob 
masking  and  filtering  proeess,”  where  the  resulting  image  form  the  previous  steps  is  used  to  fill 
baekground  holes,  for  example,  by  using  the  Matlab  function  imf  ill.  A  mask  for  red  color 
is  then  created,  so  that  when  applied,  the  masked  image  extracts  the  red  blobs  with  a  black 
background  (that  is,  non-masked  pixels  are  set  to  zero).  A  filtering  step  is  needed  to  suppress 
residual  noise  from  the  previous  operations.  The  last  step  is  to  compute  the  centroid  of  the 
individual  red  blobs  in  the  image  coordinate  frame  (Figure  3.12). 
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3.2  Implementation  and  Experimental  Results 

3.2.1  Detection  Metrics 

Many  measures  of  deteetion  performanee  that  are  eommonly  used  ean  readily  be  found  in  the 
literature,  ineluding: 

•  High  eorreet  (positive)  deteetion  rate 

•  Low  false  alarm  rate 

•  Real-time  operation  measured  by  eomputation  time  (of  algorithms) 


3.2.2  UAV  Detection  using  Edge  Detection  and  Image  Smoothing 

To  evaluate  and  test  this  approaeh  video  of  one  approaehing  plane  on  USS  JOHN  C.  STENNIS 
(CVN-74)  was  used,  eourtesy  of  previous  work  done  by  Ryan  S.  Yusko  [39]. 


As  mentioned  in  Seetion  3.1.1,  the  first  step  is  to  perform  the  frame  grabbing.  The 
approaeh  video  was  stored  in  MPEG-2  format  with  a  resolution  of  720  pixels  wide  by  480 
pixels  high,  interlaeed  at  29.97  EPS  [39].  We  developed  a  Matlab  funetion  to  read  the  video 
frame  by  frame  and  save  eaeh  frame  as  a  JPEG  image.  Eigure  3.6  illustrates  the  eolleetion  of 
grabbed  frames. 


Figure  3.6:  Frame  Grabbing:  Illustration  of  frame  grabbing  that  takes  as  input  the  test  video  and  converts 
it  to  individual  frames  at  near  30  frames  per  seconds. 


The  seeond  step  is  “frame  eropping.”  After  some  experienees  with  these  frames,  we 
noted  that  the  annotations  that  show  on  the  video  make  the  deteetion  diffieult,  so  it  was  deeided 
to  define  a  Region  of  Interest  (ROI)  as  being  the  upper  right  quadrant  of  the  image  frame  to 
keep  only  the  ROI  (Figure  3.7).  The  “eropping  funetion”  developed  in  Matlab  ean  be  found  in 
the  Appendix. 


27 


Cropped  frame  1  Cropped  frame  100  Cropped  frame  200 


Figure  3.7:  Frame  cropping:  It  has  been  noted  that  most  of  time  the  approaching  plane  is  on  the  upper 
right  corner  where  both  a:  and  y  coordinate  are  positive,  so  a  cropping  operation  was  needed  eliminate 
the  digital  numbers  that  comes  with  the  recorded  video,  which  makes  the  detection  harder. 


For  illustration  purposes,  we  ehose  frame  number  20  for  applieation  of,  respeetively, 
RGB  to  gray  seale  eonversion,  Sobel  deteetion  to  deteet  eonneeted  eomponents,  and  filtering 
and  outlier  suppression  as  outlined  previously.  The  sequenee  of  the  proeess  is  illustrated  in 
Figure  3.8. 


Figure  3.8:  Detection  using  Edge  Detection  and  Image  Smoothing,  showing  application  of  (a)  RGB  to 
gray  scale  conversion,  (b)  connected  components  detection,  and  (c)  filtering  and  outliers  suppression 


Detection  Rate  Results 

We  performed  objeet  deteetion  experiments  using  the  edge  deteetion  and  image  smoothing  ap- 
proaeh  over  191  frames,  with  the  following  results: 


Number  of  tested  frame 

Deteetion 

Missed  deteetion 

Mean  exeeution  time  [see] 

191 

169 

22 

1.4 

Table  3.1:  Detection  rates:  The  probability  of  detection  is  0.884  and  the  probability  of  missed  detection 
is  0.115 


Indeed,  this  approaeh  shows  high  deteetion  rate  with  a  deteetion  probability  of  0.884  and 
a  mean  exeeution  time  of  1 .4  seeonds.  The  mean  exeeution  time  is  the  average  eomputation  time 
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measured  after  running  the  algorithm  on  191  frames  with  a  HP  laptop  Intel  Core  i5  and  Matlab 
as  programming  environment.  The  assumption  under  which  this  approach  was  implemented  is 
that  there  are  no  other  objects  on  the  scene  having  the  same  size  as  the  target  UAV.  That  is,  this 
approach  exhibits  high  false  alarm  rates  in  highly  cluttered  image  environments  where  many 
objects  appear  to  have  in  the  same  shape  as  the  target  UAV.  Investigation  of  methods  to  address 
this  issue  are  reserved  for  future  work. 

3.2.3  Morphological  Filtering  and  Color-Based  Detection 

This  approach  is  mainly  inspired  by  the  work  in  [38];  however,  that  study  used  HSV  as  color 
space,  while  the  present  study  use  RGB.  Though  the  emphasis  in  this  section  is  to  describe  the 
approach  as  applied  to  UAV  detection,  its  use  is  further  motivated  by  our  intent  to  apply  it  to 
the  pose  estimation  problem  in  Chapter  4.  Six  red  patches  are  used  in  an  asymmetric  configu¬ 
ration  on  the  UAV  plane  as  previously  shown  in  Figure  3.3.  This  method  uses  a  morphological 
structuring  element  having  a  disk  shape  with  a  radius  =  2  pixels  using  the  Matlab  function  call 
(see  also  Appendix): 

structuringElement  =  streK’disk’ ,  2) 

Then  the  morphological  closing  operation  is  applied  to  the  image  to  enlarge  the  bound¬ 
aries  of  foreground  (bright)  regions  and  shrink  background  color  holes  in  such  regions  using 
the  following  Matlab  function: 

redObjectsMask  =  imclose(redObjectsMask,  structuringElement) 

The  next  step  in  this  approach  is  to  determine  the  color  threshold  (Figure  3.10).  using 
the  individual  color  bands,  as  illustrated  in  Figure  3.9.  After  determining  the  upper  and  lower 
bound  of  each  color  band’s  threshold,  these  thresholds  are  applied  to  mask  green  and  blue 
colors,  while  keeping  the  red.  These  thresholding  steps  provide  the  input  to  the  morphological 
filter,  namely  the  image  showing  only  red  objects,  as  illustrated  in  Figure  3.1 1. 

The  next  step  in  the  detection  process  extracts  connected  components  (red  markers)  and 
determines  their  center  of  mass  locations  in  the  image  plane  coordinate  frame,  as  illustrated  in 
Figure  3.12.  These  locations  are  then  recorded  and  used  to  register  detection  of  the  UAV. 

Detection  Rate 

The  metric  for  the  presented  detection  algorithm  is  the  number  of  red  spots  detected.  That  is, 
if  more  than  four  red  patches  are  successfully  detected,  the  detection  process  for  localizing  the 
UAV  object  within  an  image  is  considered  successful.  To  test  this  algorithm,  we  generated  a 
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Figure  3.9:  Color  Band  Extraction:  Individual  color  bands  of  red,  green,  and  blue  (panels  (b),  (c),  (d), 
respectively)  were  extracted  and  saved,  from  the  original  image  shown  in  (a). 


xlO^ 


(a) 


(b) 


Figure  3.10:  Threshold  Extraction:  Determine  a  threshold  for  each  color,  e.g.,  by  use  of  color  his¬ 
tograms:  (a)  histogram  for  all  three  color  bands,  (b)  histogram  associated  only  with  the  red  band. 


Masked  Original  Image 
Showing  Only  the  Red  Objects 


Figure  3.11:  Red  Color  Detection:  The  individual  color  thresholds  are  applied  to  mask  green  and  blue 
colors,  while  keeping  the  red.  The  resulting  image  is  shown,  highlighting  the  locations  of  the  six  red 
patches  placed  on  the  UAV  body. 


Figure  3.12:  Color  blobs’  Centers  of  Mass:  The  center  of  gravity  for  each  individual  red  blob  is  deter¬ 
mined  and  saved.  Their  coordinates  are  computed  on  the  image  plane  coordinate  frame. 
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large  number  of  images  in  laboratory  settings  using  different  configurations  of  red  patches  on 
the  UAV,  and  obtained  the  following  results. 


Number  of  runs 

Detection 

Missed  detection 

Mean  execution  time  (seconds) 

40 

38 

2 

2 

Table  3.2:  Detection  rates:  Where  “number  of  runs”  are  the  number  of  randomly  generated  red  color 
markers  in  a  planar  distribution.  In  this  approach  the  probability  of  detection  was  0.95  and  the  probability 
of  missed  detection  was  0.05 

Indeed,  this  approach  shows  high  detection  rate  with  a  detection  probability  of  0.95  and 
a  mean  execution  time  of  2  seconds  measured  after  running  the  algorithm  on  40  runs  with  a 
HP  laptop  Intel  Core  i5  and  Matlab  as  the  computation  environment.  The  assumption  under 
which  this  approach  was  implemented  is  that  there  are  no  other  red  objects  on  the  scene  and  the 
lighting  conditions  are  suitable  for  detection  red  color.  That  is,  this  approach  exhibits  high  false 
alarm  in  the  presence  other  sources  of  red  color  and  in  a  poor  lighting  conditions.  Robustness 
studies  are  left  for  future  study. 

3.3  Summary  of  Results 

The  first  detection  method  “UAV  detection  using  edge  detection  and  image  smoothing”  shows 
high  detection  rate  with  a  detection  probability  0.884  and  a  mean  execution  time  of  1.4  seconds, 
which  means  the  average  computation  time  measured  after  running  it  on  191  frames  with  a  HP 
laptop  Intel  Core  i5  and  Matlab  as  the  computation  environment.  This  approach  was  imple¬ 
mented  under  the  assumption  that  there  are  no  other  objects  in  the  scene  having  the  same  size  as 
the  target  UAV.  That  is,  it  exhibits  high  false  alarm  in  highly  cluttered  sky  environments  where 
many  objects  appear  to  be  the  same  shape  as  the  target  UAV.  A  tracking  function  can  be  used  to 
help  eliminate  these  false  alarms. 

The  second  detection  method  “morphological  filtering  and  color-based  detection”  where 
red  color  markers  are  placed  on  the  surface  of  the  target  UAV  in  an  asymmetric  configuration. 
This  method  shows  high  detection  rate  with  a  detection  probability  0.95  and  a  mean  execution 
time  of  2  seconds,  which  means  the  average  computation  time  measured  after  running  it  on  40 
random  red  patch  distribution.  This  approach  was  implemented  under  the  assumption  that  the 
target  UAV  is  detected  if  it  is  possible  to  detect  four  red  patches.  That  is,  it  exhibit  low  detection 
rate  in  low  lighting  condition  and  the  target  UAV  orientation  with  respect  to  the  camera.  It  also 
faces  false  alarm  problem  when  there  are  other  sources  of  red  color. 
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CHAPTER  4: 

Computer  Vision  Methods  for  UAV  Pose  Estimation 


This  chapter  addresses  the  UAV  pose  estimation  problem.  The  first  seetion  presents  a  theoretieal 
foundation  for  three  pose  estimation  methods  ineluding  one  that  provides  pose  estimation  using 
planar  homography,  a  non-iterative  pose  estimation  approaeh  developed  by  researehers  at  EPFL, 
and  applieation  of  the  Posit  algorithm  for  eoplanar  points  to  estimate  pose.  The  seeond  seetion 
presents  their  implementation  and  experimental  results,  followed  by  a  summary  of  results. 

4.1  Pose  Estimation  Methods 

The  goal  of  pose  estimation  is  to  get  the  position  and  orientation  of  an  objeet  (i.e.,  six  degrees 
of  freedom)  from  a  2D  image  (e.g.,  from  a  CCD  eamera).  That  is,  given  a  ealibrated  eamera,  an 
objeet  with  identifiable  feature  points,  and  their  eorresponding  points  on  the  image  plane,  the 
goal  is  to  obtain  the  rotation  (roll,  piteh,  yaw)  and  the  translation  (x,  y,  z)  of  the  objeet  relative 
to  a  known  referenee  frame,  as  illustrated  in  Figure  4.1.  As  we  mentioned  in  Chapter  1,  this 
study  foeuses  on  planar  feature  points,  as  the  UAV  model  used  has  a  nearly  planar  shape. 


Figure  4.1:  Pose  Estimation  Process:  A  generic  pose  estimation  algorithm  takes  as  input  the  3D  features 
points,  the  image  coordinates  of  these  features  points  on  the  image  plane,  and  the  camera  intrinsic 
parameters  and  provides  as  output  the  rotations  angles  and  the  translation  vector. 

4.1.1  Pose  Estimation  using  Planar  Homography 

The  technique  used  herein  follows  from  the  method  proposed  by  Yang  and  his  eolleagues  [27], 
although  pose  estimation  using  homographies  has  extensive  literature  with  applieations  inelud¬ 
ing  finding  ground  planes,  walls,  and  roads. 

Problem  Formulation 

Suppose  there  are  N  object  points  represented  by  its  homogeneous  eoordinates  in  the  UAV 
referenee  frame,  where  the  point  is  represented  by  coordinates,  M,  =  (Y®,  ,  Wi)^ .  Recall 

that  the  superscript  B  denotes  the  (UAV)  body-fixed  frame.  The  corresponding  homogeneous 
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coordinates  of  their  images  with  respeet  to  the  image  plane  referenee  frame  are  denoted  by 
superseript  7,  sueh  that  eaeh  point  is  represented  by  m,-  =  The  terms  Wi  and  A, 

are  the  homogeneous  parameters  and  for  simplieity,  are  usually  set  to  one.  The  geometry  and 
assoeiated  referenee  frames  are  illustrated  in  Figure  4.2.  Taking  into  aeeount  that  all  points 
are  eoplanar  points,  the  UAV  body  frame  is  adjusted  in  a  way  that  the  plane  IT,  the  plane  that 
eontains  all  the  feature  points,  eoineides  with  the  plane  OX^  whieh  will  let  Zf  =  0  for  all 
objeet  points.  Given  these  definitions,  the  objeetive  is  to  eompute  the  homography,  77,  from 
whieh  it  is  possible  to  extraet  the  rotation  and  translation  matriees,  R  and  T,  respeetively. 


Figure  4.2:  Perspective  Projection  (after:  [27]):  A  set  of  features  points  are  projected 

into  the  image  plane  using  perspective  projection  resulting  in  four  image  points  {mi,m2,m3,m4}  having 
their  coordinates  represented  on  the  image  plane  reference  frame. 


The  points  Mi  and  their  perspeetive  projeetions  nii  are  related  by  Equation  4.1: 


m,  =  KR 


hx3  T 


Mi 


(4.1) 


where  K  is  the  3x3  eamera  ealibration  matrix  (e.f.  Chapter  2,  Equation  2.8)  and  7?  is  3  x  3 
rotation  matrix  that  represents  the  rotation  of  the  objeet  relative  to  eamera  coordinate  frame  and 
r  is  3  X  1  translation  veetor.  M,  is  4  x  1  homogeneous  eoordinate  of  object  points  i,  while  m/ 
is  3  X  1  homogeneous  coordinate  of  the  eorresponding  image  points.  The  augmented  matrix 


/3x3  T 
vector.  Mi. 


is  a  3  X  4  matrix  formed  by  concatenating  a  3  x  3  identity  matrix  and  the  3x1 
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Equation  4.1  can  be  rewritten  as 


m;  =  K 


Ri  R2  RT 


Mi 


nii  =  HMi 


(4.2) 

(4.3) 


where  Mi  =  (X/*,  7^  ,  WiY  ,  which  is  equivalent  to  point  Mi  without  the  z-component.  Ri  and  R2 
are  the  first  two  columns  of  the  rotation  matrix  7?.  As  shown  in  [27],  the  matrix  R^  R2  RT 
is  invertible  and  as  a  consequence,  defines  a  planar  homography,  H,  which  relates  m/  and  Mi, 
defined  as 


H 


K 


Ri  R2  RT 


(4.4) 


Using  the  fact  that  the  calibration  matrix  K  is  known  (from  prior  calibration  or  from 
provided  specifications),  we  can  multiply  both  sides  of  Equation  4.2  by  the  inverse,  K  The 
product  K  ^nii  is  denoted  by  m'  resulting  in  the  following  equation: 


in';  =  K  ^nij  = 


Ri  R2  RT 


Mi  =  H'Mi 


(4.5) 


where  K  ^  is  the  inverse  of  matrix  K  and  m[  =  (ui,Vi,XiY .  The  transformed  homography 
matrix,  H' ,  (with  elements  hij)  is  given  by 


(fill  fii2  fii3  ^ 
fi21  fi22  ^23 

^31  ^32  ^33  / 


(4.6) 


Note  that  m'  is  equal  to  HMi  up  to  a  scaling  factor;  therefore  their  cross  product  is 
zero,  i.e.,  m'  x  HMi  =  0  where  x  represents  the  cross  product  of  two  vectors.  This  planar 
constraint  can  be  rewritten  as  a  homogeneous  linear  system  of  equations  for  each  point  m[  and 
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Mi  according  to 


Aih 


0lx3  vMi 

hMi  0ix3  -uMi 


( ^ 

^12 

hi3 

hii 

hii 

hii 

hsi 

hsi 

\  ^33  / 


=  0 


(4.7) 


where  0ix3  denotes  a  zero  row-vector,  is  the  transpose  of  Mi,  Ai  G  is  defined  for 
each  point  i,  and  h  E  is  the  column  vector  comprising  elements  of  the  scaled  homography 

matrix,  H\  For  n  object  points,  the  augmented  matrix  A  E  7?2«x9  points  represents  a 

system  of  linear  equations 


A  h  = 


/A,  \ 


A2 


h 


0, 


\An  / 


for  which  singular  value  decomposition  can  be  used  to  solve  for  the  solution,  h.  The  scaled 
homography  matrix,  H',  can  be  reconstructed  from  reshaping  the  column  vector  h. 


The  desired  homography  matrix,  H,  can  now  be  computed  by  normalizing  H' .  The 
Euclidean  distance  was  used  for  normalization  purpose. 


H  =  H' /norm  {H[) 


(4.8) 


The  final  steps  include  using  the  homography  matrix,  H,  to  determine  the  object’s  rel¬ 
ative  rotation  and  translation  matrices.  The  first  and  second  columns  of  H  will  be  (after  being 
converted  to  unit  vectors)  the  first  and  second  columns  of  the  rotation  matrix  R  as  defined  in 
Equation  4.4.  The  third  column  in  the  rotation  matrix  can  be  computed  as  the  cross  product  of 
the  first  two  columns,  such  that 


R=(^  Hi  H2  H1XH2) 


(4.9) 
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The  translation  vector,  T,  can  be  computed  using  the  following  expression,  where  Ht,  = 
Hi  XH2: 

T  =  -R^H3  (4.10) 

4.1.2  Non-iterative  Pose  Estimation 

A  group  of  researchers  from  the  Ecole  Polytechnique  Federale  de  Lausanne  (EPFL)  proposed 
a  non-iterative  solution  approach  (referred  to  as  the  “EPFL  algorithm”  in  this  thesis)  to  the 
“Pose  from  n  Points”  (PnP)  problem.  They  showed  that  their  solution’s  computational  com¬ 
plexity  grows  only  linearly  with  n,  the  number  of  features  points,  and  possesses  the  advantage 
of  applying  to  both  planar  and  non-planar  feature  point  configurations. 

Formulation 

Recall  that  in  the  previously  presented  approach,  it  is  assumed  that  the  camera  intrinsic  param¬ 
eters  given  by  its  calibration  matrix  K  are  known,  and  a  set  of  n  correspondences  between  3D 
reference  points  (a.k.a,  object  points)  and  their  2D  projections  is  available.  The  main  idea  be¬ 
hind  the  EPFL  approach  presented  in  this  section  is  to  define  the  coordinates  of  the  n  reference 
points  as  a  linear  combination  of  four  known  locations  called  control  points.  Denote  by  Mu  for 
i=  1 , 2, . . . ,  n  the  reference  point  in  a  general  reference  frame,  and  the  j*  control  point  by  Cj, 
for  j  =  1,2, 3,4. 

Using  the  idea  of  the  four  control  points  as  the  bases  for  each  reference  point,  the  point 
coordinates  of  each  Mi  in  the  UAV  body  coordinate  frame  (denoted  by  superscript  B)  can  be 
expressed  as  follows: 


4  4 

Mf=Y,0CijC^,  with  £a,7  =  l  (4.11) 

7=1  7=1 


where  the  aij  are  the  normalized  barycentric  coordinates,  which  are  uniquely  defined  for  a  given 
point  Mi  [28].  In  the  same  way,  the  reference  points  in  the  camera  coordinate  frame  (superscript 
c)  are  expressed  as 

4 


Mf 


Ot/yCj 

7=1 


(4.12) 


It  remains  to  define  the  coordinates  of  the  control  points  such  that  the  above  definitions 
in  Equations  4.11  and  4.12  hold.  The  stability  of  the  proposed  approach  has  been  shown  in  the 
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literature  to  be  improved  by  taking  the  centroid  of  the  n  reference  points  as  one  of  the  control 
points,  and  then  selecting  the  remaining  three  control  points  such  that  they  form  a  basis  aligned 
with  the  principal  directions  of  the  reference  points. 

Recalling  the  camera  calibration  matrix,  K,  the  2D  projections  of  the  n  reference  points 
A/,,  /  =  1 , 2, . . . ,  n  are  represented  by 


/  m  \ 


Wi 


Vi 


KM^  =  KY,aijC‘j 


V  1  J 


j=i 


(4.13) 


where  Wi  are  scalar  projective  parameters,  the  projections  in  ID  are  denoted  Ui  and  v/,  and  the 
elements  of  the  control  point  coordinate  vector  are: 


yj 

\z^jJ 


Notice  from  Equation  4.13,  combined  with  the  definition  of  K  in  Equation  2.8,  that 

4 

Wi  =  E  Ctiyzf.  Substitution  of  this  expression  in  the  first  two  rows  of  Equation  4.13  results  in 

7=1 

the  following  two  linear  equations: 

4 

^  {0Cijfux‘j  +  aij{cx  -  Ui)z‘j)  =  0  (4.14) 

/=1 
4 

Y,  -  Vi)z])  =  0  (4.15) 

If  these  two  linear  equations  are  concatenated  for  all  n  reference  points,  the  resulting 
homogeneous  linear  system  can  be  represented  as  Ax  =  0,  where  the  solution  is  given  by  .r  G 

^12x1. 

I  CS  ) 


which  comprises  the  camera  coordinates  of  each  control  point,  and  A  is  a  2n  x  12  matrix.  That 
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is,  the  solution  for  the  homogeneous  linear  system  belongs  to  the  kernel  of  matrix  A;  therefore 
it  is  expressed  as 

dim^(A^A) 

-^=  52 

i=i 

where  dim,/k(A^A)  is  the  dimension  of  the  null-space  of  A^A,  and  v;  are  the  right- singular 
vectors  of  A  corresponding  to  the  singular  values  of  A,  from  which  the  coefficients,  /3/,  can  be 
determined.  A  more  detailed  description  of  this  method  and  its  derivation  can  be  found  in  [28] . 


4.1.3  Pose  Estimation  using  Posit  Algorithm  for  Coplanar  Points 

DeMenthon  and  Davis  [25]  developed  a  method  called  “Pose  from  Orthographic  Projection  and 
Scaling  with  Iterations,”  also  known  as  the  Posit  algorithm.  The  method  can  estimate  a  pose 
of  an  object  from  a  single  image  provided  that  four  or  more  point  correspondences  between  3D 
and  2D  are  given.  Two  main  parts  involved  in  this  method  are: 

•  POS  (i.e..  Pose  from  Orthography  and  Scaling),  which  uses  scaled  orthographic  projec¬ 
tion  to  approximate  the  perspective  projection  and  finds  the  rotation  matrix  and  translation 
vector  from  a  linear  system  of  equations,  and 

•  Posit  (POS  with  Iteration),  which  iteratively  uses  a  scale  factor  for  each  point  to  enhance 
the  found  orthographic  projection  and  then  uses  POS  on  the  new  points  instead  of  the 
original  ones  until  a  threshold  is  met. 


Notation 

Figure  4.3  depicts  the  classic  pinhole  camera  [25]  with  the  following  parameters: 

•  The  camera  center  of  projection  is  the  point  C 

•  The  image  plane  G  at  a  distance  /  (focal  length)  from  C 

•  CX^  and  axis  points  along  the  rows  and  columns  of  camera  sensor,  the  CZ^  points 
along  the  optical  axis.  Their  unit  vectors  noted  i,  j,  k  ,  respectively. 

It  also  shows  the  UAV  body  coordinate  frame  having  its  center  at  point  Mq,  called  the  reference 
point,  and  its  three  axes,  MqX^,  MqY^  ,  and  MqZ^.  The  plane  K  is  the  plane  parallel  to  image 
plane  G  passing  through  Mq.  The  images  of  the  feature  points  M,  are  called  m,  with  (A,,y/)  as 
image  coordinates. 
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Problem  Definition 

The  purpose  is  to  compute  the  rotation  matrix  R  and  the  translation  vector  of  the  UAV  coordinate 
frame.  The  rotation  matrix  is  given  by: 


R  = 


Jx 


y 

Jy 

K-y 


Jf 


”1 

r  1 

■bT 

J 

- 1 

(4.17) 


where  the  matrix  rows  are  the  coordinates  of  the  unit  vectors  i,  j,  and  k  of  the  camera  coordinate 
frame  as  expressed  in  the  UAV  coordinate  frame.  In  order  to  compute  R,  it  is  only  necessary 
to  compute  vectors  i  and  j  in  the  UAV  coordinate  frame.  The  vector  k  can  be  computed  as  the 
normal  vector  found  by  the  cross-product  i  x  j,  referring  to  Figure  4.3. 


Figure  4.3:  Posit  Projections  (after:  [25]):  This  figure  illustrates  a  feature  point  M,  as  well  as  the  refer¬ 
ence  point  Mq.  The  perspective  projection  of  M,  is  the  point  m,,  and  its  SOP  is  the  point  pi.  The  point 
Mo  has  the  same  image  mq  in  SOP  and  perspective  projections. 


The  translation  vector  is  the  vector  C  Mq  between  the  center  of  projection,  C,  and  the 
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reference  point  Mq.  This  vector  is  aligned  with  C  niQ  and  equal  to 


Zo//Cmo  (4.18) 

so  the  translation  is  fully  determined  if  Zq  is  computed.  Each  object  point  Mi  is  projected  onto 
point  nii  on  the  image  plane,  which  leads  to  the  following  equalities: 


xo  =  fXolZo  (4.19) 

Xi  =  fXilZi  (4.20) 

for  X  coordinates  and  similarly  for  y  coordinates: 

yo  =  /Eo/Zo,  (4.21) 

yi  =  fYilZi.  (4.22) 


Equation  4.20  can  be  expanded  into 

MqMI  -I  +  Xq 

■Xi  =  J - ^ - 

MoMi-k  +  Zo 

_  [f /ZqXMqMi)-1+Xq 
~  (1/ZoxMoM,-)4+1 


(4.23) 

(4.24) 


where  the  second  equation  is  found  by  dividing  numerator  and  denominator  by  Zq. 


It  has  been  shown  [25]  that  a  necessary  and  sufficient  condition  for  a  pose  defined  by 
i,  j,  XQ^yo,  and  Zq  to  be  exact  pose  is  that  these  parameters  satisfy,  for  all  points  Mi,  the  equations: 


MqMI  ■  7  =  X;  ( 1  +  £;■)  —  xq  (4.25) 

MQMi-J  =  yi{\+£i)-yQ  (4.26) 

where 

J  =  fxj/ZQ  (4.27) 

/  =  /x?/Zo,and  (4.28) 

e,-  =  (l/Zo)MoMri  (4.29) 


41 


The  terms  ;c,  (  1  +  £,)  and  yi(  1  +  £/)  are  the  eoordinates  jc-  and  y-  of  the  point  pu  whieh  are 
the  sealed  orthographie  projeetions  of  features  points  Mi  (refer  to  Figure  4.3).  The  z  eoordinate 
of  the  veetor  MqMI  is  the  dot  produet  MoMi.kZi  —  Zq,  so 

(l+£,-)  =  l  +  (Z,--Zo)/Zo  (4.30) 

=  Zi/Zo  (4.31) 

where  the  seeond  expression  is  found  using  Equation  4.20.  Sinee  pi  is  the  perspeetive  pro- 
jeetion  of  the  point  Pj  that  has  the  same  jc-eoordinate  as  Mi  and  a  z-eoordinate  equal  to  Zq. 
If  we  substitute  the  value  of  Xi  found  on  Equation  4.20,  the  jc-eoordinate  a:'  of  pi  is  preeisely 
determined. 


x'i  =  a:,((  1  +  £i)  =  XiZijZ^  (4.32) 

^i  =  fXijZo  (4.33) 


Finding  Poses 

Equation  4.25  and  Equation  4.26  provide  a  linear  systems  of  equations  in  whieh  the  only  un¬ 
known  are  1  and  J  for  a  given  values  of  £/ .  Solving  for  I  and  J  leads  to  i  and  j  by  normalization 
and  Zq  will  be  the  norm  of  7.  This  algorithm  is  ealled  POS  (Pose  from  Orthography  and  Sealing) 
whieh  provides  an  approximation  of  the  pose.  But  onee  i  and  j  have  been  eomputed,  more  exaet 
values  ean  be  reeomputed  using  Equation  4.26  solving  the  linear  system  again.  This  iterative 
applieation  of  POS  is  ealled  Posit  (POS  with  Iterations). 


Solving  the  System  of  Equations 

The  dot  produet  in  Equations  4.25  and  4.26  ean  be  expressed  in  terms  of  veetor  eoordinates  in 
the  UAV  eoordinate  frame. 


xr 


Yf  M 


xr 


Yf  M 


J., 


1  ^ 

=  Xi{l+ei) -xq 

(4.34) 

iT 

=  yi{l+£i)-yo 

(4.35) 
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Writing  the  last  two  equations  for  the  n  objects  points  results  in  the  following  two  linear 
systems: 


AI  =  x 
AJ  =  y 

The  matrix  A  represents  the  coordinates  of  the  objects  points  Mi  in  the  UAV  coordinate 
frame.  Since  this  study  is  dealing  with  coplanar  points,  the  matrix  A  has  rank  2  so  additional 
constraints  are  required  to  form  a  well-posed  system  of  linear  equations.  The  ambiguity  of 
solution  can  be  seen  as  illustrated  in  Figure  4.4. 


Figure  4.4:  Same  pose  from  different  planes  (from:  [25]):  In  this  case,  the  object  plane  can  have  two 
very  different  positions  that  returns  the  same  2D  image  correspondences. 


When  dealing  with  coplanar  points,  several  poses  that  are  very  different  have  the  same 
orthographic  projection  as  shown  in  Figure  4.4.  That  is,  the  Posit  algorithm  for  coplanar  points 
aims  to  find  all  the  poses  and  then  chooses  the  best  match.  More  specifically,  it  finds  two  poses 
for  each  iteration  and  either  picks  one  or  continues  with  both  options  (see  Figure  4.1.3).  This 
process  can  be  illustrated  graphically  as  a  tree  with  n  iterations  and  2n  solutions  but  only  two 
poses  are  kept  at  any  time  (though  they  may  both  be  feasible).  For  each  pose,  it  is  further 
verified  that  all  the  points  are  in  front  of  the  camera  (i.e.,  all  Z,-  >  0);  otherwise  the  pose  is 
discarded.  More  details  about  the  method  can  be  found  in  [25]. 
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Case  1:  one  feasible  solution  Case  2:  two  feasible  solutions 


(a)  (b) 

Figure  4.5:  Posit  Feasible  Solutions  (from:  [25]):  (a)  Case  1:  the  algorithm  yields  one  feasible  pose 
represented  by  +  sign  and  one  unfeasible  pose  representation  by  —  sign;  (b)Case  2:  two  feasible  solution 
but  one  of  them  is  better  then  the  other  and  they  are  represented  by  ++  and  +  sign,  respectively. 


4.2  Implementation 

This  section  addresses  the  implementation  of  the  different  pose  estimation  algorithms  explored 
in  this  thesis  and  their  corresponding  experimental  results.  A  crucial  step  before  using  any  pose 
estimation  method  is  to  compute  the  camera  intrinsic  parameters  (a.k.a  camera  calibration).  The 
purpose  of  camera  calibration  is  the  recovery  of  the  principal  distance  parameter,  i.e.,  the  focal 
length,  /  (assuming  f  =  fx  =  fy),  and  the  principal  point  coordinates  {Cx,Cy).  To  achieve  this 
goal,  an  open  source  calibration  Matlab  toolbox  [32]  was  used,  where  a  dot  pattern,  such  as 
seen  in  Figure  4.6,  is  used  to  compute  the  camera  intrinsic  parameters. 


•*••••  • 


•  • 


••  . 
•  • 


•  • 


•  • 
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Figure  4.6:  Calibration  Pattern  (from:  [32]):  The  calibration  toolbox  uses  a  planar  pattern  comprising 
randomly  distributed  black  dots  where  an  object  will  be  placed  in  different  orientations  to  help  compute 
the  camera  intrinsic  parameters. 


The  present  study  uses  the  iPhone  4s  camera  with  an  empty  plastic  bottle  in  front  of  the 
calibration  pattern  in  several  orientations,  as  shown  in  Figure  4.2. 
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•  • 


Figure  4.7:  Calibration  Images:  The  empty  plastic  bottle  is  placed  over  the  calibration  pattern  and  its 
orientation  changed  four  times  to  compute  the  camera  intrinsic  parameters. 


Use  of  the  calibration  matrix  toolbox  on  the  calibration  images  in  Figure  4.2  yields  the 
the  following  calibration  matrix: 


K  = 


/  608.7 
0 

V  0 


0 

608.7 

0 


314  \ 
320 
1  / 


(4.36) 


where  /  =  608.7  is  the  focal  length  and  the  principal  point  coordinates  are  =  314  and  Cx  — 
320. 


4.3  Simulation  Setup 

The  experiment  was  carried  out  with  synthetic  data  for  the  three  approaches  discussed  in  Sec¬ 
tion  4.1.  The  experiment  can  roughly  be  summarized  in  the  following  steps,  also  graphically 
illustrated  in  Figure  4.8: 

1.  Generate  n  random  points  in  the  object  plane 

•  In  this  thesis,  we  use  n  =  6  reference  points 

2.  Generate  three  random  angles  and  build  the  corresponding  rotation  matrix,  R 

3.  Generate  three  translations  along  the  x,  y,  and  z  axes  to  form  the  translation  vector,  T 

•  These  three  steps  were  performed  by  a  Matlab  function  called  Generate_Data(NP) 
where  NP  is  the  number  of  feature  points. 

4.  Perform  3D  to  ID  perspective  projections  using  camera  calibration  matrix  A,  the  n  object 
points,  the  rotation  R  and  translation  T : 

•  The  Matlab  function  imgsynthesis(NP, points, R,T,F0CAL_L)  is  used,  where 
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points  are  the  random  points  generated  using  the  Generate_Data  function  de¬ 
scribed  above 

•  R  is  the  rotation  matrix,  computed  from  randomly  generated  angles  and  using  the 
Matlab  function  rotation 

•  T  is  the  randomly  generated  translation  vector 

•  F0CAL_L  is  the  camera  focal  length  from  the  camera  calibration  matrix 

5.  Add  Gaussian  noise  to  the  image 

•  This  step  is  performed  using  a  Matlab  function  called  AddNoise  (ImagePoints ,  NOISE_AMPL) 
where  ImagePoints  is  a  three  dimensional  array  that  contain  the  randomly  gener¬ 
ated  features  points  and  NOISE_AMPL  is  the  noise  amplitude 

6.  Use  the  noisy  image  points  and  the  object  points  to  estimate  real  rotation  and  translation 

•  After  obtaining  the  simulated  image  points  and  their  corresponding  features  points, 
the  pose  estimation  algorithm  is  used  to  estimate  the  rotation  and  translation  vector 

•  Once  the  rotation  matrix  has  been  found,  the  three  rotation  angles  can  be  extracted 
using  the  function  GetANG.fromRotMat  (Rp)  where  Rp  is  the  calculated  rotation 
matrix 

7.  Compare  the  estimated  parameters  with  the  generated  ones  and  compute  the  errors 

•  The  computed  rotation  and  translation  parameters  are  compared  to  the  known  (ran¬ 
domly  generated)  rotation  angles  and  translation  values  which  were  used  to  compute 
the  original  image  points. 

•  The  mean  and  standard  deviation  of  errors  between  the  estimated  and  generated 
rotation  matrix  and  translation  vector  are  computed  and  recorded 

The  aforementioned  steps  are  repeated  over  100  runs  in  which  new  random  set  of  object 
points,  random  rotation  matrix,  and  random  translation  vector  are  generated  to  evaluate  the 
three  approaches  highlighted  in  Section  4.1.  The  mean  error  and  the  standard  deviation  for  the 
three  rotation  angles  as  well  as  for  the  three  translations  are  then  computed.  The  effect  of  noise 
amplitude  on  the  pose  estimation  and  the  execution  time  for  each  methods  is  also  investigated. 

The  parameter  ranges  used  in  these  numerical  studies  are  shown  in  Table  4.3. 


4.4  Experimental  Results 

This  section  presents  the  results  for  each  pose  estimation  method  by  illustrating  the  rotation  and 
translation  errors  as  well  as  their  respective  execution  times. 
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Figure  4.8:  Simulation  Setup:  Once  a  rotation  matrix  R  and  a  Translation  vector  T  are  generated  from 
random  data  they  are  used  to  project  the  features  points  on  the  image  plane  to  get  the  images  points. 
Both  features  points  and  image  points  are  fed  to  the  pose  estimation  method  to  estimate  the  object  pose 
and  compare  the  estimated  rotation  and  translation  with  the  generated  ones. 


Parameter 

Min 

Max 

Rotation 

Translation  [X,Y,Z] 

-njl 

0 

njl 

1000 

Object  coordinates 

X 

-40 

40 

Y 

-40 

40 

Z 

0 

0 

Table  4. 1 :  Parameter  ranges  for  randomly  generated  data:  The  rotation  angles  are  defined  to  be  between 
-n  and  n,  the  translation  for  X,  Y,  and  Z  to  be  between  zero  and  100.  The  object  feature  points  are 
defined  between  —40  and  40  for  both  X  and  Y  and  zero  for  Z  coordinates  since  the  study  is  dealing  with 
coplanar  features  points. 


4.4.1  Non-iterative  Pose  Estimation  Algorithm 

The  EPFL  algorithm  was  executed  over  100  random  runs  where  three  rotations  angles,  three 
translation  values,  and  six  features  points  were  randomly  generated.  The  estimated  rotation 
angles  were  found  to  be  very  accurate,  where  their  mean  error  (in  degrees)  was  around  zero  for 
roll,  pitch  angle,  and  yaw  angle,  as  reported  in  Figure  4.9.  Concerning  the  translation  estimation 
error,  the  EPFL  non-iterative  algorithm  provided  encouraging  results,  where  the  highest  mean 
error  was  0.390  m  for  the  Z  translation  (see  Figure  4.10). 
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R3  Rotation  error  for  100  runs:  Mean  =-2-8258e-008  &STD  =  4  8204e-007 
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Figure  4.9:  EPFL  Algorithm  Rotation  Angles  Error:  This  figure  plots  rotation  errors  for  each  rotation 
angle  as  a  function  of  number  of  runs  with  a  fixed  number  of  feafure  poinfs  and  a  noise  amplifude  equal 
fo  one  pixel.  The  mean  error  and  sfandard  deviafion  of  error  are  computed  over  fhese  100  random  runs. 


Z-Translation  error  for  100  runs;  Mean  =0.47649  &STD  =  4.3586 
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Eigure  4. 10:  EPEE  Algorifhm  Translafions  Error:  This  figure  plofs  fhe  error  for  fhe  fhree  franslafions 
along  X,  Y,  and  Z  axes  as  a  funclion  of  number  of  runs  wifh  a  fixed  number  of  feafure  poinfs  and  a  noise 
amplifude  equal  fo  one  pixel.  The  mean  error  and  sfandard  deviafion  of  error  are  compufed  over  fhese 
100  random  runs 


The  computation  cost  of  this  method,  shown  in  Figure  4.11,  remained  almost  the  same 
around  0.025  seconds  for  100  random  runs  except  for  some  outliers  that  does  not  exceed  0.2 
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seconds.  (These  few  longer  computation  times  may  be  due  to  the  performance  of  the  laptop 
computer  used  to  perform  these  tests.) 


Mean  Execution  Time  =  0.0068491  &STD  =  0.025936 
0.9 

0.8 
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£  0.4 
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Figure  4.11:  EPFL  Algorithm  Execution  Time:  This  figure  plots  execution  time  (in  milliseconds)  for 
each  run  as  well  as  the  overall  mean  and  standard  deviation  for  the  execution  time. 


4.4.2  Posit  Algorithm 

This  algorithm  was  executed  over  a  100  random  runs  where  we  randomly  generated  three  rota¬ 
tions  angles,  three  translations,  and  six  features  points.  The  estimated  result  for  yaw  angle  was 
very  accurate,  having  a  mean  error  around  zero  for  all  100  runs.  However,  the  estimated  result 
for  pitch  and  roll  angles  were  not  accurate,  having  a  standard  deviation  of  15.54  and  25.65  de¬ 
grees,  respectively  (Figure  4.12).  Concerning  the  translation  estimation  error,  acceptable  results 
were  obtained  for  Z-translation,  but  not  for  X  and  Y  translations.  This  algorithm  needs  further 
tuning  to  get  better  results  to  be  competitive  with  the  EPFL  algorithm. 

The  computation  cost  of  this  method  remained  almost  the  same  around  0. 1  seconds  for 
100  random  runs  except  for  some  outliers  that  does  not  exceed  0.2  seconds  (Figure  4.14). 

4.4.3  Homography-Based  Pose  Estimation  Algorithm 

This  algorithm  was  executed  over  100  random  runs  where  three  rotations  angles,  three  trans¬ 
lation,  and  six  features  points  were  randomly-generated.  The  estimated  result  for  yaw  angle 
was  very  accurate,  with  a  mean  error  and  standard  deviation  near  zero.  However,  the  estimated 
results  for  pitch  and  roll  angles  were  not  acceptable,  having  standard  deviations  of  25.48  and 
28.78  degrees,  respectively,  as  shown  in  Figure  4.15.  The  translation  estimation  error  for  Z 
translation  was  found  to  be  acceptable,  but  not  those  for  X  and  Y  translations,  which  had  a 
standard  deviation  of  153.52  and  66.02  meters,  respectively  (see  Figure  4.16).  The  implemen¬ 
tation  of  this  algorithm  may  need  further  improvement  and  optimization  to  get  better  results  as 
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R3  Rotation  error  for  100  runs;  Mean  =3.7566e-007  &STD  =  2.9642e-006 


I  0  5 


E 

3  o' - ■ - - - 1 - 1 - 1 - - - 1 

■20  0  20  40  60  80  100  120 

Rotation  error 


R2  Rotation  error  for  100  runs:  Mean  =3.7491  &STD  =  26.6506 
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R1  Rotation  error  for  100  runs:  Mean  =-0. 39075  &STD  =  15.5471 
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Figure  4.12:  Posit  Algorithm  Rotation  Error:  This  figure  plots  rotation  errors  for  each  rotation  angle  as 
a  function  of  number  of  runs  with  a  fixed  number  of  feafures  poinfs  and  a  noise  amplifude  equal  fo  one 
pixel. 
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Figure  4.13:  Posit  Algorithm  Translation  Error:  This  figure  plots  error  for  the  three  translation  alon 
X,  Y,  and  Z-axis  as  a  function  of  number  of  runs  with  a  fixed  number  of  features  points  and  a  noise 
amplitude  equal  to  one  pixel. 


compared  to  the  EPFL  algorithm.  Finally,  the  eomputation  eost  of  this  method  remained  eon- 
sistently  around  0.15  seeonds  for  the  100  random  runs,  exeept  for  some  outliers  that  reaehed 
0.35  seeonds  (Figure  4.17). 
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Mean  Execution  Time  =  0.10008  &STD  =  0.056296 
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Figure  4.14:  Posit  Algorithm  Execution  Time:  This  figure  plots  execution  time  for  each  run  as  well  as 
the  overall  mean  and  standard  deviation  for  the  execution  time.  It  was  noted  that  the  execution  time  for 
this  algorithm  is  random. 


R2  Rotation  error  for  100  runs:  Mean  =-2.0801  &STD  =  25.4804 
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Figure  4.15:  Homography  Algorithm  Rotation  Error:  This  figure  plots  rotation  errors  for  each  rotation 
angle  as  a  function  of  number  of  runs  with  a  fixed  number  of  feafures  poinfs  and  a  noise  amplitude  equal 
to  one  pixel. 


4.5  Summary  of  Results 

The  three  pose  estimation  algorithms  were  tested  over  a  100  randomly  generated  parameters 
using  six  objeet  features  points.  The  EPFL  algorithm  showed  very  aeeurate  results  in  estimating 
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Z'Translation  error  for  100  runs:  Mean  =-0.2274  &STD  =  4.8666 
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Figure  4.16:  Homography  Algorithm  Translation  Error:  This  figure  plots  error  for  the  three  translation 
alon  X,  Y,  and  Z-axis  as  a  function  of  number  of  runs  with  a  fixed  number  of  features  points  and  a  noise 
amplitude  equal  to  one  pixel. 
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Figure  4.17:  Homography  Algorithm  Execution  Time:  This  figure  plots  execution  time  for  each  run  as 
well  as  the  overall  mean  and  standard  deviation  for  the  execution  time. 

the  pose  for  these  eoplanar  points.  The  other  two  algorithms,  Posit  and  the  homography -based 
approaehes,  showed  some  aeeeptable  results  on  estimating  Z  translation  and  yaw  angle  values, 
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but  failed  to  provide  a  good  estimate  for  the  remaining  parameters,  highlighting  the  need  for 
further  improvement  and  tuning.  Table  4.2  summarizes  this  study’s  findings. 


Estimated  parameters 

EPFL 

Posit 

Homography 

roll  STD  (degrees) 

0.315 

15.84 

2.96  X  10-^ 

pitch  STD  (degrees) 

3.47  X  10-*^ 

6.30 

25.65 

yaw  STD  (degrees) 

0.375 

6.402  X  10^^ 

15.54 

X-Translation  STD  (meters) 

3.814 

59.9 

151.92 

T-Translation  STD  (meters) 

5.57 

160.77 

153.61 

Z-Translation  STD  (meters) 

3.92 

6.69 

4.11 

Mean  execution  time  (seconds) 

0.0057 

0.124 

0.56 

Table  4.2:  Summary  of  Results:  In  comparing  the  performance  of  each  algorithm  in  estimating  rotation 
angles  and  translation  vectors,  it  was  noted  that  EPFL  gives  very  accurate  results  and  near  real-time 
execution. 
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CHAPTER  5: 

Conclusions  and  Recommendations 


This  chapter  presents  the  results  achieved  in  this  thesis,  the  limitations  and  problems  faced,  and 
then  highlights  the  eventual  avenues  for  future  research. 

5.1  Summary  and  Conclusions 

This  thesis  presented  different  computer  vision  algorithms  for  both  UAV  detection  in  still  im¬ 
ages  and  pose  estimation  of  that  UAV.  The  first  task  was  addressed  by  investigating  and  imple¬ 
menting  two  computer  vision  algorithms  for  UAV  detection.  The  first  algorithm  was  based  on 
using  edge  detection  and  image  smoothing.  This  approach  gave  a  high  detection  rate,  but  only 
works  under  the  assumption  that  there  is  no  other  object  in  the  scene  having  the  same  size  as  the 
target  UAV.  The  second  algorithm  applied  morphological  filtering  and  color-based  detection, 
where  red  color  markers  were  placed  on  the  surface  of  the  target  UAV  in  an  arbitrary  configura¬ 
tion.  This  method  showed  promising  results  but  suffered  from  requiring  constant  illumination, 
lighting  conditions,  and  target  orientation,  thereby  limiting  its  robustness  in  realistic  contexts. 

The  pose  estimation  task  was  addressed  using  three  pose  estimation  algorithms.  The 
first  one  used  planar  homography,  which  showed  barely  acceptable  results;  the  translation  stan¬ 
dard  error  was  around  150m  which  is  not  acceptable  in  a  real  UAV  avoidance  (or  engagement) 
application.  This  inaccuracy  could  be  due  to  the  incorrect  assumption  that  feature  points  and 
their  image  points  are  related  via  planar  homography  and  may  need  better  implementation  and 
improvement.  The  second  approach  was  based  on  a  non-iterative  pose  estimation  algorithm  pre¬ 
sented  by  researchers  from  EPFL.  This  algorithm  gave  good  results  for  both  rotation  and  trans¬ 
lation  (refer  to  Figure  4.9)  and  can  work  for  both  co-planar  and  non-coplanar  feature  points.  The 
last  algorithm  studied  represents  a  well-known  Pose  from  Orthographic  Projection  and  Scaling 
with  Iterations  (Posit)  approach;  however,  we  find  that  although  this  algorithm  gives  acceptable 
results  for  the  rotation  angles,  the  translation  vector  achievable  renders  this  approach  less  useful 
for  real-time  UAV  pose  estimation  purposes. 

5.2  Recommendations  and  Future  Work 

This  thesis  can  be  considered  as  an  initial  proof  of  concept  and  a  foundation  for  the  implemen¬ 
tation  of  computer  vision  algorithms  for  both  UAV  detection  and  pose  estimation.  There  are 
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several  issues  to  be  addressed  and  future  researeh  avenues  to  be  considered.  Matlab  provided 
a  simulation  and  a  test  environment  in  the  lab  environment.  That  is,  an  alternative  implemen¬ 
tation  would  be  using  OpenCV  (open  computer  vision)  for  real-time  implementation  and  faster 
computation  since  OpenCV  has  precompiled  libraries  that  may  work  better  for  real  image  data. 

Another  avenue  for  future  research  would  be  to  design  and  build  light-based  beacons 
(e.g.,  LEDs)  mounted  to  the  UAV  body  and  test  the  detection  and  pose  estimation  in  field 
experiment  conditions.  In  addition,  it  may  be  possible  to  investigate  and  test  different  detec¬ 
tion  approaches  such  as  the  use  of  RANdom  SAmple  Consensus  (RANSAC)  algorithms  [41], 
implementation  of  Scale-Invariant  Feature  Transform  (SIFT)  features  [42],  and  optical  flow 
approaches  that  have  been  shown  to  also  have  promising  results  for  such  applications. 
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Appendix;  Selected  Matlab  Source  Code 


All  source  code  developed  and  used  for  this  thesis  can  be  found  online  at  https  :  //faculty . 
nps .  edu/thchung  under  the  Software  seetion. 


Frame  Grabbing  Function 

This  function  converts  a  MPEG-2  video  to  a  sequence  of  frame  in  JPEG  format  at  a  rate  of  30 
frame  per  seconds  (EPS). 


function  f ramegrabber ( vidrostream ) 

%movie2frames  generate  frames  from  the  given  video 
%  vid  is  the  video  file  name 
%  USAGE  :>>  framegrabber{'video.avi') 

%mmreader  with  the  read  method  used  to  read  video  data  from  a  multimedia 
%file  in  our  case  ’  avi  '  file 

readerobj  =  mmreader ( vidrostream ) ; 
vidFrames  =  read ( readerobj ) ; 

%  get  the  number  of  frames  using  EPS  =  30  (EPS  frame  per  seconds 
numFrames  =  get  (readerobj  ,  ’  numberOfFrames  ’ )  ; 

%  initialisation  of  wait  bar  as  a  debugging  tool 

h  =  waitbar(0,  'Grabbing  Frames . '); 

for  k  =  1  :  numFrames 

%  get  the  individual  frames  from  the  frame  sequences 
mov(k).cdata  =  vidFrames(:  ,k); 

mov ( k )  .  colormap  =  []; 

%imshow(mov(k)  .  cdata  )  ; 

imagename=strcat (int2str(k),  ’.jpeg'); 
cd  (  ’  .  \  frames  \  ' )  ; 

%  write  each  frame  to  the  'frames  '  folder 
imwrite  (mov(k)  .  cdata  ,  strcat  (  '  frame  '  ,  imagename  ) )  ; 

waitbar (k/numFrames ,h) ; 
fprintf(l,  'saving  frame  %d  :  \  r  '  ,  k)  ; 

%ex  tract  Components  (mov{k)  .  cdata  )  ; 

cd  . . 

end 

fprintf(l,  'Done  :  number  of  frames  %d\r',  numFrames); 
end 


Image  Cropping  Function 

This  function  extract  a  Region  Of  Interest  (ROI)  defined  by  the  upper  right  eorner  of  the  frame 
where  both  x  and  y  are  positive. 


57 


%######################################################################## 
%  cropping  :  explore  the  frame  folder  and  crop  down  them  to  reduce  their 
%  size  and  keep  the  pixels  that  represent  the  UAV  and  get  read  of  the 
%  outlier  to  help  the  detection  process 
%####################################################################### 
%  go  to  the  frame  folder  and  list  the  .png  files 
cd  (  '  .  \  images  \  ' )  ; 
files  =  dir ( ' * . png ’ ) ; 

%  sort  these  images  and  count  them 
files  =  s ort ( {  f iles . name } ) ; 
nframes  =  size (files ,2) ; 

h  =  waitbar  (0  Cropping  Frames . '); 

%  loop  over  the  number  of  frames  and  resize  them 
for  k  =  1 : nframes 

imagename=s treat (int2str(k),  png  ’ ) ; 

imgn  =  strcat(  'frame  ’  ,imagename); 
rgb  =  imread ( imgn ) ; 

%  change  directory  to  copped  folder  where  we  will  save  the  resized  images 
cd (  '  .  \ cropped \  ' ) ; 

rgb  =  rgb(1200: 1400 ,800: 1800  ; 

imwr ite ( rgb , imgn ) ; 
waitbar (k/nframes ,h) ; 

cd  . . 
end 

close (h) ; 
cd  . . 

fprintf(l,  'Done  cropping  ...\r'); 


Generate_Data  function 

This  function  generates  random  planar  feature  points,  random  rotation  angles,  and  random 
translation  veetor  to  be  used  for  image  synthesis  and  to  test  the  pose  estimation  algorithm. 


function  [points , Rotang , Trans ]  =Generate_Data(NP) 

%^+++  Function  created  by  Hajri  Riadh 

%^+++  This  function  generates  random  planar  feature  points 
%++++  angles  ,  and  random  translation  vector 
%++++  Input:-! 

NP:  number  of  points 
%^-+++  Output:-! 

%-t"i"i"i-  points  :  matrix  of  randomly  generated  points 
%^-+++  Rotang:  Three  rotation  angles 
%-t"i"i"i-  Trans:  Translation  vector 

%###  we  consider  here  a  80x80  square  (unit:  cm) 

X_min  =  —  40; 

X_max  =40; 

Y_min  =  —  40; 


random  rotation 
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Y_max  = 

40; 

for  i  =  ] 

:NP 

points ( 

i,:)  = 

[  ( X_max— X_min ) *  rand  ( 1  , 

1 ) +X_min  ,  ( Y_max— Y 

_min ) *  rand ( 1  , 1 ) +Y_min ,  0]; 

end 

ang_lo 

=  —pi  /2 

;  ang_hi  =  pi /2 ; 

%  define  an 

gle 

interval 

Rotang 

=  [(ang_ 

hi— ang_lo ) *  rand (3  , 1 )  + 

ang_lo ] *  1 80/ pi 

; 

trans_lo  =  0; 

trans_hi  =  1000; 

%  interval 

for 

translation 

Trans  = 

(trans 

_hi— trans_lo ) *rand (3 ,1) 

+  trans_lo ; 

Image  Synthesis  Function 

This  function  performs  a  perspective  projection  of  the  n  feature  points  to  get  the  corresponding 
image  points  using  the  randomly  generated  rotation  matrix  and  translation  vector. 


function  [ ImagePoints ]  =  imgsynthesis ( Npoint , WorldPoint s , rotat ionM , trans , Flength ) 
%++++  function  created  by  Hajri  Riadh 

%++++  This  function  perform  a  perspective  projection  of  the  feature  points 
%++++  to  get  the  corresponding  image  points 

%++++  Npoint:  number  of  features  points.  In  our  Npoint  =  6 
%H-+++  WorldPoints:  a  matrix  of  feature  point  coordinates 
%++++  rotationM  the  randomly  generated  rotation  matrix 
%++++  trans:  the  randomly  generated  translation  vector 
%++++  Flength  :  camera  focal  length 

%++++  ImagePoints:  a  mtrix  containing  the  resulting  image  points 


%^++  add  the  translation  for  x,y,  and  z  coordiantes 

moved  =  zeros (Npoint ,3) ; 

moved ( : , 1 ) =moved (: ,l)+trans(l) ; 

moved{: ,2)=moved(: ,2)+trans(2) ; 

moved(: ,3)=moved(: ,3)+trans(3) ; 


%+++  perform  the  perspective  projection 
for  i  =  l  :Npoint 
for  j  =1 :3 

for  k=  1 :3 

moved ( i , j )=  moved ( i , j )+rot at ionM (j ,k)*WorldPoints(i,k) ; 
end 


end 

end 

for  i  =  l  :Npoint 
for  j  =1 :2 

ImagePoint s ( i , j ) =  Flength*moved ( i , j ) /moved ( i , 3 ) ; 

end 

end 
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end 


Rotation  Angle  Extraction  from  Rotation  Matrix 

This  function  was  created  based  on  the  paper  of  Gregory  G.  Slabaugh  it  computes  the  euler 
angles  from  the  give  rotation  matrix. 


function  [ rotUC , rotWC , rotZC ]  =  Get ANG_f romRotMat (Rm) 

%++++ 

created  by  Hajri  Riadh 

%++++ 

inspired  from  paper  of  Gregory  G.  Slabaugh  Computing 

Euler  angles 

%++++ 

from  a  rotation  matrix" 

%++++ 

url  :  http  :/  /  gregslabaugh  .  name  /  publications  /  euler  .  pdf 

%++++ 

this  function  has  as  imput  the  rotation  matrix  and  it 

outpiut  are  the 

%++++ 

three  rotation  angles 

rotUC 

=  atan  (Rp  (3  ,2) /Rp  (3  ,3 ) )  ; 

rotWC 

=  atan  ( —  (Rp  { 3  , 1  )*  sin  ( rotUC  ))/Rp(3,2)); 

rotZC 

=  atan  ( Rp (2 , 1 ) /Rp ( 1  , 1 )  )  ; 

rotUC 

=  rotUC  *  1 80/ pi  ; 

rotWC 

=  rotUC  *  1 80/ pi  ; 

rotZC 

=  rotZC  *  1 80/ pi  ; 

%+++  End  of  function 

Color  Detection  Function 

This  function  performs  RED  color  detection  by  extracting  color  threshold,  then  applying  mor¬ 
phological  closing  operation  and  image  mask  to  isolate  and  localize  the  red  markers. 


function  ColorDetect  () 

%++++++++++++++++++++  RED  marker s  Detection  ++++++++++++++++++++++++++++ 

%^-+++  This  function  was  adapted  by  Hajri  Riadh  from  the  tutorial  in 
%++++  http  :  /  /www.  math  works  .  com  /  matlabcentral/  fileexchange 
%++++  /285  1 2  —  simple —color —detection —by— hue  done  by  Image  Analyst 
%++++  Copyright  (c)  2010,  Image  Analyst  All  rights  reserved 

%++++  This  function  perform  RED  color  detection  by  extracting  color  threshold  , 
%++++  morphological  closing  and  image  maske  to  keep  the  red  band  and  desired 
%++++  object  pixel  size 


clc  ;  %  Clear  command  window, 

clear;  %  Delete  all  variables. 

close  all;  %  Close  all  figure  windows  except  those  created  by  imtool  . 
%  imtool  close  all:  %  Close  all  figure  windows  created  by  imtool. 
workspace;  %  Make  sure  the  workspace  panel  is  showing. 


then  apply 
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close  all  ; 
fontSize  =  16; 
figure  ; 

%  Maximize  the  figure  . 

set(gcf,  'Position',  get(0,  ’  ScreenSize  '  )  )  ; 

%  Change  the  current  folder  to  the  folder  of  this  m-file  . 

%  (The  line  of  code  below  is  from  Brett  Shoelson  of  The  Mathworks.) 
if (-is deployed) 

cd  (  f  ilepart  s  ( which  ( mf  ilename  ) ) )  ; 

end 

%  Change  default  directory  to  the  one  containing  the  default  image 
%  folder 

or iginalFolder  =  pwd; 

folder  =  'C:\Users\dragonisto\studieslast\computer  vision\dotm  Files\UAV'; 
if  - e xi s t ( folder ,  'dir') 
folder  =  pwd; 

end 

cd ( folder ) ; 

%  Browse  for  the  image  file  . 

[  baseFileName  ,  folder]  =  uigetfile(  'Specify  an  image  file’); 

f ullImageFileName  =  fullfile (folder ,  baseFileName); 

%  Set  current  folder  back  to  the  original  one. 
cd ( originalFolder ) ; 

%end 

%  Read  in  image  into  an  array  . 

[rgbimage  storedColorMap ]  =  imread ( f ullImageFileName ) ; 

[rows  columns  numberOf ColorBands ]  =  s i ze ( rgbimage ) ; 

%  If  it  '  s  monochrome  (indexed),  convert  it  to  color. 

%  Check  to  see  if  it  '  s  an  8— bit  image  needed  later  for  scaling), 
i f  strcmpi ( class ( rgbimage ) ,  ’ u int8  ' ) 

%  Flag  for  256  gray  levels . 
eightBit  =  true; 

else 

eightBit  =  false; 

end 

if  numberOf ColorBands  ==  i 

if  isempty  (  StoredColorMap ) 

%  Just  a  simple  gray  level  image,  not  indexed  with  a  stored  color  map. 

%  Create  a  3D  true  color  image  where  we  copy  the  monochrome  image  into  all  3  (R,  > 

G,  &B)  color  planes. 

rgbimage  =  cat(3,  rgbimage,  rgbimage,  rgbimage); 

else 

%  It  ’  s  an  indexed  image. 
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rgbimage  =  ind2rgb ( rgbimage ,  storedColorMap) ; 

%  ind2rgb()  will  convert  it  to  double  and  normalize  it  to  the  range  0—1. 
%  Convert  back  to  uintS  in  the  range  0  —  255,  if  needed, 
if  eightBit 

rgbimage  =  uint8(255  rgbimage); 

end 

end 

end 


%  Extract  out  the  color  bands  from  the  original 
%  into  3  separate  2D  arrays  ,  one  for  each  color 
redBand  =  rgbimage (:,  1); 

greenBand  =  rgbimage (:,  2); 

blueBand  =  rgbimage (:,  3); 


image 

component . 


[countsR,  grayLevelsR]  =  imhist ( redBand ) ; 
maxGLValueR  =  find(countsR  >  0,  1,  'last  '); 
maxCountR  =  max( countsR) ; 


[countsG,  grayLevelsG]  =  imhist ( greenBand ) ; 
maxGLValueG  =  find(countsG  >  0,  1,  'last  '); 
maxCountG  =  max( countsG) ; 


[countsB,  grayLevelsB]  =  imhist 
maxGLValueB  =  find(countsB  >  0, 
maxCountB  =  max( countsB ) ; 


( blueBand ) ; 

1  ,  'last'); 


%  Determine  the  max  gray  level  for  the  three  bands 
maxGrayLevel  =  max ([ maxGLValueR ,  maxGLValueG,  maxGLValueB]); 


%  define  threshold  for  each  color  band,  our  focus  is  on  red  color 
redThresholdLow  =  graythresh ( redBand ) ; 
redXhresholdHigh  =  255; 
greenThresholdLow  =  0; 

greenThresholdHigh  =  graythresh(greenBand) ; 
blueThresholdLow  =  0; 

blueThresholdHigh  =  graythresh ( blueBand ) ; 

%  if  it  is  8  bit  image  we  perform  a  suitable  conversion 
if  eightBit 

redThresholdLow  =  uintS ( redThresholdLow  *  255); 
greenThresholdHigh  =  uintS ( greenThresholdHigh  *  255); 
blueThresholdHigh  =  uintS ( blueThresholdHigh  *  255); 

end 


%  Now  apply  each  color  band's  particular  thresholds  to  the  color  band 
redMask  =  (redBand  >=  redThresholdLow)  &  (redBand  <=  redThresholdHigh) ; 
greenMask  =  (greenBand  >=  greenThresholdLow)  &  (greenBand  <=  greenThresholdHigh); 
blueMask  =  (blueBand  >=  blueThresholdLow)  &  (blueBand  <=  blueThresholdHigh); 
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redOb j ectsMask  =  uintS ( redMask  &  greenMask  &  blueMask) ; 

%  define  the  smallest  acceptable  area  should  be  adjusted  with  the  light 
%  beacon  size  on  the  image 

smallestAcceptableArea  =  10; 

%  removes  all  connected  components  that  have  fewer  than  'smallestAcceptableArea  '  pixels 
redObj ectsMask  =  uint8(bwareaopeii(red0bjectsMask ,  smallestAcceptableArea)); 

%  define  a  morphological  structuring  element  having  a  disk  shape  with  a 
%  radius  =  2  pixels 

structuringElement  =  strel(  '  disk  '  ,  2); 

%perform  morphological  close  operation  that  enlarge  the  boundaries  of  foreground  (bright)-f-> 
regions  in  an  image  (and 
%shrink  background  color  holes  in  such  regions  ) 
redObj ectsMask  =  imclose ( redObj ectsMask ,  structuringElement); 

%  fill  background  holes  created  by  the  preceding  operations 
redObj ectsMask  =  uintS ( imf ill ( redObj ectsMask ,  'holes')); 
redObj ectsMask  =  cast ( redObj ectsMask ,  class ( redBand )) ; 

%  prepare  mask  for  each  color  band 
maskedImageR  =  redObj ectsMask  .*  redBand; 
maskedImageG  =  redObj ectsMask  greenBand; 
maskedImageB  =  redObj ectsMask  .*  blueBand; 

%concatenate  the  three  color  band's  mask 

maskedRGBImage  =  cat (3,  maskedImageR,  maskedImageG,  maskedImageB); 

imshow ( maskedRGBImage ) ; 
fontSize  =  13; 

caption  =  sprintf('Red  Beacom  only'); 
t i 1 1  e ( caption ,  'FontSize',  fontSize); 

%  save  the  resulting  image  to  the  disk  for  further  processing 

Img  =  maskedRGBImage ; 

tmp  =  rgb2gray (Img) ; 

imwrite  (tmp  ,  '  tempimage  .gif  ' )  ; 

return  ; 

%++++++++++++++++++  End  of  ColorDetect  function  +++++++++++++++++++++++++++ 


Connected  Components  Extraction  Function 

The  purpose  of  this  function  is  to  isolate  connected  component  in  the  input  image,  extract  and 
then  locate  the  centroid  of  each  connected  component. 


function  conect edcomponent ( image ) 

^+++++++++++++++++  Connected  component  extraction  +++++++++++++++++++++++ 

%  the  purpose  of  this  function  is  to  isolate  connected  component  in  the  input  image, 
%  and  then  locate  the  centroid  of  each  connected  component 
I  =  imread ( image )  ;  %  read  the  input  image 


extract 
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gray  =  medf  ilt2  ( I ,  [  9  10]);  %  perform  a  median  filter  to  the  image  to  get  ride  of  noise  and 

undesired  effects 

level  =  graythresh ( gray )  ;  %  Image  thresholding  using  automatic  threshold 
bw  =  gray>  0.2; 

imwrite  ( bw  ,  'bw .  jpg  ’  )  ;  %  save  the  binary  image  in  a  temporary  image 

[L,num]  =  bwlabel  (bw)  ;%bwlabel  returns  a  matrix  L,  of  the  same  size  as  BW,  containing  labels 
for  the 

%connected  objects  in  BW. 

M  =  im2uint8 (L/num) ; 

imwrite (M, jet,  'label. jpg');%  save  the  labeled  image  into  label.jpg 
for  i  =  1 : num 

area(i)  =  bwarea  ( L==i )  ;  %  estimates  the  area  of  the  objects  in  binary  image 

end 

%  keep  object  with  area  >210 
X  =  find  (area>210)  ; 

s  =  regionprops  (L,  'PixelldxList  ’  ,  'PixelList  '  ,  'Centroid  '); 
centroids  =[ s . Centroid ] ; 

%  show  the  isoled  connected  component  as  well  as  their  centeroid 
imshow ( L ) 

t  i  1 1  e  (  ’  Red  Blobs  center  of  mass  'FontSize',  13); 

hold  on 

for  k  =  1 : num 

idx  =  s(k) . PixelldxList ; 
pixel_values  =  double ( L ( idx )) ; 
sum_pixel_values  =  sum(pixel_values ) ; 

X  =  s(k) . PixelList (: ,  1); 

y  =  s ( k ). PixelList (: ,  2); 

%  display  the  centroid  of  each  connected  component  with  each  index 
%  "Point  x" 

xbar  =  sum(x  .*  pixel_values )  /  sum_pixel_values ; 
ybar  =  sum(y  .*  pixel_values )  /  sum_pixel_values ; 

plot (xbar,  ybar,  '*') 
text(xbar+2,  ybar,  'Point') 

end 

hold  off 
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